Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language. The surprising emerging capabilities of large language models (like GPT-4) open up new design spaces. Many classic computer vision tasks can be translated into the language domain and can be (partially) solved there. Understanding the current capabilities, the shortcomings and approaches in the language domain will be essential for the future Computer Vision research. So the selected papers this year focus on the key concepts used in todays large language models as well as the approaches to combine computer vision with language.

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

The introduction meeting (together with Thomas Brox's seminar) will be in person, while the mid semester meeting may be online. The block seminar itself will be in person to give you the chance to practise your real-world presentation skills and to have more lively discussions

Contact person: Silvio Galesso

(2 SWS)
(2 days in the term break, Feb-Apr, 2024)

Beginning: If you want to participate, attend the mandatory introduction meeting (Will be held jointly with Seminar on Current Works in Computer Vison) on October, 18th, 14:15, register in HisInOne, and submit your paper preferences before October, 23rd.

Mid-Semester Meeting: (date tba) Introduction to Generative models by apl. Prof. Olaf Ronneberger (DeepMind)

ECTS Credits: 4

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned for both seminars via a preference voting (see above). If there are more interested students than places, first priority will be given to students who attended the intrdocution meeting. Afterwards, we follow the assignments of the HisInOne system. We want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. The listed papers are not yet sorted by the date of presentation. If you don't attend the meeting (or not send a paper preference) but choose this seminar together with only other overbooked seminars in HisInOne, you may end up without a seminar place this semester.

Students who just need to attend (failed SL from previous semester), need not send a preference for a paper, but just reply with "SL only".


GPT4(V)ision example


from Thomas Brox's seminar:


Papers will be added soon
Time ID Paper Comment / project page Student Advisor
B1 LoRA: Low-Rank Adaptation of Large Language Models Lukas Liemen Leonhard Sommer
B2 The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (we will select parts and combine it with the GPT-4V(ision) system card Redi Muharremi Jelena Bratulic
B3 Large Language Models Cannot Self-Correct Reasoning Yet Anurag Garg David Hoffmann
B4 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles Arkadyuti Kundu Artur Jesslen
B5 Sigmoid Loss for Language Image Pre-Training Indrashis Das Simon Ging
B6 Towards In-context Scene Understanding Yumna Ali Sudhanshu Mittal
B7 Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Tim Steinke Silvio Galesso
B8 Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction https://lakonik.github.io/ssdnerf/ Robin Textor-Falconi Philipp Schroeppel
B9 DynIBaR: Neural Dynamic Image-Based Rendering Abigail Durst Philipp Schroeppel
B10 TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement https://deepmind-tapir.github.io/ Luca Pfrang Johannes Dienert