Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language, and audio. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil").

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

We will have the introduction meeting online and have a poll there on whether the seminar will be online or in presence.

Contact person: Silvio Galesso

(2 SWS)
Thursday, 16th of March, 9:30 to 15:30 and
Friday, 17th of March, 9:30 to 15:30.
Beginning: If you want to participate, register in HisInOne for the course, attend the introduction meeting on October 19 14:00 (Will be held jointly with Seminar on Current Works in Computer Vison), and send an email with your name and your paper priorities (B1 - B9, favorite paper first) to Silvio Galesso by October 24.

Mid-Semester Meeting: Monday, 9th of January Introduction to Generative models by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned for both seminars via a preference voting (see above). If there are more interested students than places, first priority will be given to students who attended the meeting on Oct. 19. Afterwards, we follow the assignments of the HisInOne system. We want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the date of presentation. If you don't attend the meeting (or not send a paper preference) but choose this seminar together with only other overbooked seminars in HisInOne, you may end up without a seminar place this semester.

Students who just need to attend (failed SL from previous semester), need not send a preference for a paper, but just reply with "SL only".

    Video generated from text prompts (Figure 1 from the Phenaki paper):
  1. A photorealistic teddy bear is swimming in the ocean at San Francisco
  2. The teddy bear goes under water
  3. The teddy bear keeps swimming under the water with colorful fishes
  4. A panda bear is swimming under water


from Thomas Brox's seminar:


Thursday 16.03.23:

09:30-10:30 Training Compute-Optimal Large Language Models Jin Woo Ahn Simon Ging
10:30-11:30 Elucidating the Design Space of Diffusion-Based Generative Models Samir Garibov Philipp Schroeppel
11:30-12:30 An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion Aasaipriya Chandran Maria Bravo
12:30-13:30 Lunch break
13:30-14:30 Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Julian Eble Max Argus
14:30-15:30 DreamFusion: Text-to-3D using 2D Diffusion Margarita Zhdanovich see also https://dreamfusion3d.github.io/ Max Argus

Friday 17.03.23:

09:30-10:30 Make-A-Video: Text-to-Video Generation without Text-Video Data Jonghyun Ham see also https://makeavideo.studio/ Sudhanshu Mittal
10:30-11:30 Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions Swathi Thiruvengadam see also https://phenaki.video/ Simon Ging
11:30-12:30 Imagen Video: High Definition Video Generation with Diffusion Models Bijay Gurung see also https://imagen.research.google/video/ Silvio Galesso
12:30-13:30 Lunch break
13:30-14:30 Data Distributional Properties Drive Emergent In-Context Learning in Transformers Pawel Bugyi Simon Schrodi
14:30-15:30 Self-supervised video pretraining yields strong image representations Ahmet Selim Canakci David Hoffmann