Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like audio, and language. Especially generative models and unsupervised methods have a large potential to learn concepts from large non-annotated data bases (see a blog post from DeepMind on "Unsupervised learning: the curious pupil"). For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation. The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 12.

Due to the COVID-19 pandemic the seminar is planned to be partially held online. The introductory lecture will be in presence together with the Seminar on Current Works in Computer Vision on Monday 25.10.2021, 14:00-15:30. The student presentations and mid-semester meeting are planned to be online, in case of any change you will be informed. Presentations will be given with a teleconferencing tool using screen sharing, as will be the discussions of the papers.

The presencial sessions require you to be physically present. This again requires being vaccinated, cured, or having a fresh, valid Corona test (which costs money). If you do not want to satisfy these constraints, do not choose this seminar but one that is offered online.

If you want to participate, register in HisInOne for the course, attend the introduction meeting on October 25 14:00, and send an email with your name and your paper priorities (B1-B12, favorite paper first) to Maria Bravo before October 26.

(2 SWS)
To be announced.
Online via teleconference.
Contact person: Maria Bravo

Beginning: Monday 25.10, 14:00-15:30, Room 52-2-17
Will be held jointly with Seminar on Current Works in Computer Vison

Mid-Semester Meeting: To be announced
Video conference link will be in the e-mails
Introduction to Neural Networks by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned via a preference voting (detailed information will follow). Please register for the seminar online before the first meeting. If you could not register still come to our introductory online meeting to see if there are papers free. If there are more interested students than places, places will be assigned by a mixture of motivation in the first meeting and priority suggestions of the system. The date of registration is NOT important. In particular, we want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the time of presentation.

Please get in contact with your advisor as soon as possible, and at least 4 weeks before your presentation

Submit your presentation outline to your advisor at least 2 weeks before your presentation and meet with your advisor.

Submit your presentation slides to your advisor at least 1 week before your presentation and meet again.

All participants must read all papers and answer a few questions. The questions will be available here. The answers must be sent to the corresponding advisor until 23:59 of 20/03/2022. We highly recommend to read and understand all papers first, before you start to prepare your presentation.

Slides of the introductory lecture
Powerpoint template for your presentation (optional)


Time Paper Student   Advisor Slides  
B4 Pix2seq: A Language Modeling Framework for Object Detection Chen Li Sudhanshu Mittal
B5 MLP-Mixer: An all-MLP Architecture for Vision Jack Brons Osama Makansi
B7 Vision Transformers for Dense Prediction Christian Leininger Max Argus
B8 CoAtNet: Marrying Convolution and Attention for All Data Sizes Niket Ahuja Jan Bechtold
B9 SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition Siyu Chen Tonmoy Saikia
B10 Powerpropagation: A sparsity inducing weight reparameterisation William Jobson Pargeter Simon Ging
B11 Perceiver IO: A General Architecture for Structured Inputs & Outputs Till Fetzer Max Argus
B12 Multimodal Few-Shot Learning with Frozen Language Models Yubo Wang Maria Bravo