Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language, and audio. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil").

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation. The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

We will have the introduction meeting online and have a poll there on whether the seminar will be online or in presence.

Contact person: Silvio Galesso

(2 SWS)
tba (two days between mid February and mid April 2023)
Beginning: If you want to participate, register in HisInOne for the course, attend the introduction meeting on October 19 14:00 (Will be held jointly with Seminar on Current Works in Computer Vison), and send an email with your name and your paper priorities (B1 - B9, favorite paper first) to Silvio Galesso by October 24.

Mid-Semester Meeting: tba (before Christmas) Introduction to Generative models by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned for both seminars via a preference voting (see above). If there are more interested students than places, first priority will be given to students who attended the meeting on Oct. 19. Afterwards, we follow the assignments of the HisInOne system. We want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the date of presentation. If you don't attend the meeting (or not send a paper preference) but choose this seminar together with only other overbooked seminars in HisInOne, you may end up without a seminar place this semester.

Students who just need to attend (failed SL from previous semester), need not send a preference for a paper, but just reply with "SL only".

    Video generated from text prompts (Figure 1 from the Phenaki paper):
  1. A photorealistic teddy bear is swimming in the ocean at San Francisco
  2. The teddy bear goes under water
  3. The teddy bear keeps swimming under the water with colorful fishes
  4. A panda bear is swimming under water

Powerpoint template for your presentation (optional)


ID Paper Comment Student Advisor
B1 Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions see also https://phenaki.video/ Swathi Thiruvengadam
B2 Elucidating the Design Space of Diffusion-Based Generative Models Samir Garibov
B3 An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion Aasaipriya Chandran
B4 Imagen Video: High Definition Video Generation with Diffusion Models see also https://imagen.research.google/video/ Bijay Gurung
B5 DreamFusion: Text-to-3D using 2D Diffusion see also https://dreamfusion3d.github.io/ Margarita Zhdanovich
B6 Training Compute-Optimal Large Language Models (paper will be shortened to typical conference paper size) Jin Woo Ahn
B7 Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors Julian Eble
B8 Data Distributional Properties Drive Emergent In-Context Learning in Transformers Paweł Bugyi
B9 Make-A-Video: Text-to-Video Generation without Text-Video Data see also https://makeavideo.studio/ Jonghyun Ham
B10 Self-supervised video pretraining yields strong image representations Ahmet Selim Canakci