Block-Seminar on Deep Learning
apl. Prof. Olaf Ronneberger (DeepMind)In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language, and audio. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil").
For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.
We will have the introduction meeting online and have a poll there on whether the seminar will be online or in presence.
|
![]()
|
Material
from Thomas Brox's seminar:
- Giving a good presentation
- Proper scientific behavior
- Powerpoint template for your presentation (optional)
QUESTIONS
Thursday 16.03.23:
09:30-10:30 | Training Compute-Optimal Large Language Models | Jin Woo Ahn | Simon Ging | |
10:30-11:30 | Elucidating the Design Space of Diffusion-Based Generative Models | Samir Garibov | Philipp Schroeppel | |
11:30-12:30 | An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | Aasaipriya Chandran | Maria Bravo | |
12:30-13:30 | Lunch break | |||
13:30-14:30 | Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors | Julian Eble | Max Argus | |
14:30-15:30 | DreamFusion: Text-to-3D using 2D Diffusion | Margarita Zhdanovich | see also https://dreamfusion3d.github.io/ | Max Argus |
Friday 17.03.23:
09:30-10:30 | Make-A-Video: Text-to-Video Generation without Text-Video Data | Jonghyun Ham | see also https://makeavideo.studio/ | Sudhanshu Mittal |
10:30-11:30 | Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions | Swathi Thiruvengadam | see also https://phenaki.video/ | Simon Ging |
11:30-12:30 | Imagen Video: High Definition Video Generation with Diffusion Models | Bijay Gurung | see also https://imagen.research.google/video/ | Silvio Galesso |
12:30-13:30 | Lunch break | |||
13:30-14:30 | Data Distributional Properties Drive Emergent In-Context Learning in Transformers | Pawel Bugyi | Simon Schrodi | |
14:30-15:30 | Self-supervised video pretraining yields strong image representations | Ahmet Selim Canakci | David Hoffmann |