Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like audio, and language. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil"). This term we will specifically look into diffusion models, that recently outperformed GAN's for image synthesis (see example on the right).

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation. The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

The presencial sessions require you to be physically present. This again requires being vaccinated, cured, or having a fresh, valid Corona test (which costs money). If you do not want to satisfy these constraints, do not choose this seminar but one that is offered online.

(2 SWS)
August or September (To be announced.)
Contact person: Silvio Galesso

Mid-Semester Meeting: To be announced.
Introduction to Neural Networks by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned via a preference voting (detailed information will follow). Please register for the seminar online before the first meeting. If you could not register still come to our introductory online meeting to see if there are papers free. If there are more interested students than places, places will be assigned by a mixture of motivation in the first meeting and priority suggestions of the system. The date of registration is NOT important. In particular, we want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the time of presentation.

Please get in contact with your advisor as soon as possible, and at least 4 weeks before your presentation

Submit your presentation outline to your advisor at least 2 weeks before your presentation and meet with your advisor.

Submit your presentation slides to your advisor at least 1 week before your presentation and meet again.

All participants must read all papers and answer a few questions. The questions will be available here. The answers must be sent to the corresponding advisor before the beginning of the seminar. We highly recommend to read and understand all papers first, before you start to prepare your presentation.

    Images generated from text prompts (Figure 1 from the DALL-E2 paper)

Powerpoint template for your presentation (optional)


ID Paper Student Comment Advisor
- Vision Transformers for Dense Prediction Christian Leininger postponed presentation from last term
B1 Denoising Diffusion Probabilistic Models Arian Mousakhan Tonmoy Saikia
B2 Diffusion Models Beat GANs on Image Synthesis Dipti Sengupta Tonmoy Saikia
B4 DALL-E 2: Hierarchical Text-Conditional Image Generation with CLIP Latents Mohamed Ibrahim see https://openai.com/dall-e-2/ for more examples Max Argus
B5 Video Diffusion Models Karim Farid see https://video-diffusion.github.io/ for videos. Simon Ging
B6 Unsupervised Semantic Segmentation by Distilling Feature Correspondences Leonhard Sommer David Hoffmann
B7 Denoising Diffusion Implicit Models Tidiane NDIR Simon Schrodi
B8 High-Resolution Image Synthesis with Latent Diffusion Models Tom Wellinger Silvio Galesso
B9 Flamingo: a Visual Language Model for Few-Shot Learning Zahra Padar (shorter paper might be available until the block seminar. Otherwise we'll select a subset of sections) Sudhanshu Mittal