Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like audio, and language. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil"). This term we will specifically look into diffusion models, that recently outperformed GAN's for image synthesis (see example on the right).

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation. The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

The presencial sessions require you to be physically present. This again requires being vaccinated, cured, or having a fresh, valid Corona test (which costs money). If you do not want to satisfy these constraints, do not choose this seminar but one that is offered online.

Contact person: Silvio Galesso

(2 SWS)
Wednesday 28 September, 9:30 to 15:30
Thursday 29 September, 9:30 to 15:30
Mid-Semester Meeting: 21.06.2022, 14:00
Introduction to Neural Networks by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned via a preference voting (detailed information will follow). Please register for the seminar online before the first meeting. If you could not register still come to our introductory online meeting to see if there are papers free. If there are more interested students than places, places will be assigned by a mixture of motivation in the first meeting and priority suggestions of the system. The date of registration is NOT important. In particular, we want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the time of presentation.

Please get in contact with your advisor as soon as possible, and at least 4 weeks before your presentation

Submit your presentation outline to your advisor at least 2 weeks before your presentation and meet with your advisor.

Submit your presentation slides to your advisor at least 1 week before your presentation and meet again.

Each participant should give a talk of approximately 30 minutes, which will be followed by a collective discussion.

All participants must read all papers and answer a few questions. The questions will be available here. The answers must be sent to the corresponding advisor before the beginning of the seminar. We highly recommend to read and understand all papers first, before you start to prepare your presentation.

    Images generated from text prompts (Figure 1 from the DALL-E2 paper)

Powerpoint template for your presentation (optional)



Wednesday 28.09.22:

09:30-10:30 Denoising Diffusion Probabilistic Models Arian Mousakhan Tonmoy Saikia
10:30-11:30 Denoising Diffusion Implicit Models Tidiane NDIR Simon Schrodi
11:30-12:30 Diffusion Models Beat GANs on Image Synthesis Dipti Sengupta Tonmoy Saikia
12:30-13:30 Lunch break
13:30-14:30 GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Yanling Zou Simon Ging
14:30-15:30 Flamingo: a Visual Language Model for Few-Shot Learning Zahra Padar Version with highlighted sections that we want to discuss in the seminar Sudhanshu Mittal

Thursday 29.09.22:

09:30-10:30 High-Resolution Image Synthesis with Latent Diffusion Models Tom Wellinger Silvio Galesso
10:30-11:30 DALL-E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents Mohamed Ibrahim see https://openai.com/dall-e-2/ for more examples Max Argus
11:30-12:30 Video Diffusion Models Karim Farid see https://video-diffusion.github.io/ for videos. Simon Ging
12:30-13:30 Lunch break
13:30-14:30 Vision Transformers for Dense Prediction Christian Leininger postponed presentation from last term Max Argus
14:30-15:30 Unsupervised Semantic Segmentation by Distilling Feature Correspondences Leonhard Sommer David Hoffmann