Block-Seminar on Deep Learning
apl. Prof. Olaf Ronneberger (Google DeepMind)In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language. The surprising emerging capabilities of large language models (like GPT-4) open up new design spaces. Many classic computer vision tasks can be translated into the language domain and can be (partially) solved there. Understanding the current capabilities, the shortcomings and approaches in the language domain will be essential for the future Computer Vision research. So the selected papers this year focus on the key concepts used in todays large language models as well as the approaches to combine computer vision with language.
For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.
The introduction meeting (together with Thomas Brox's seminar) will be in person, while the mid semester meeting will be online. The block seminar itself will be in person to give you the chance to practise your real-world presentation skills and to have more lively discussions
Contact person: Arian Mousakhan
|
![]() Figure 1 from Movie Gen (Meta) Videos at https://go.fb.me/MovieGenResearchVideos |
Material
from Thomas Brox's seminar:
- Giving a good presentation
- Proper scientific behavior
- Powerpoint template for your presentation (optional)
Papers:
Please send your answers to the questions to the corresponding advisor before the seminar day.The seminar has space for 10 students
Thursday | 09:30 | B4 | The Llama 3 Herd of Models | Complete description of the internals of LLama. Selected sections: 1; 2; 3.1; 3.2; 3.4; 4.1; 7; 9.2; 10 | Donat Sinani | Max Argus |
Thursday | 10:30 | B2 | Movie Gen: A Cast of Media Foundation Models | Impressive video quality. Selected sections: 1; 2; 3 (without 3.1.6; 3.6.3; 3.6.4; 3.6.5; 3.7) 7.1; 7.2; 8. See also https://ai.meta.com/research/movie-gen | Omar Swelam | Rajat Sahay |
Thursday | 11:30 | B1 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models | Clever pipeline using existing databases and model to build a text-to-3D model | Upamanyu Das | Leonhard Sommer |
Thursday | 12:30 | lunch break | ||||
Thursday | 13:30 | B3 | UniSim: Learning Interactive Real-World Simulators | Video generation beyond image quality, good applications and showing that they can improve embodiedAI planning capabilities | Nicolas von Trott | Karim Farid |
Thursday | 14:30 | B11 | Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution | Self-improvement and refinement in prompts for LLMs, important feature for open-endedness | Urs Micha Spiegelhalter | Jelena Bratulic |
Friday | 09:30 | B10 | Visual SKETCHPAD: Sketching as a Visual Chain of Thought for Multimodal Language Models | Cool improvement for visual reasoning: Spend more compute on "thinking" during inference. | Srikanth Shastry | Simon Ging |
Friday | 10:30 | B7 | Let's Verify Step by Step | Filtering generated outputs with a process reward model (PRM) is much better than using an outcome reward model (ORM). Might be one of the ingredients in the OpenAI o1 model. | Aishwarya Dinni | Artur Jesslen |
Friday | 11:30 | B8 | Improve Mathematical Reasoning in Language Models by Automated Process Supervision | Replaces expensive human labels to train process reward models with automated labels. | Ayisha Ryhana Dawood | Simon Schrodi |
Friday | 12:30 | lunch break | ||||
Friday | 13:30 | B9 | Training Language Models to Self-Correct via Reinforcement Learning | Self correction is an essential part of the OpenAI o1 model. The presented approach shows significant gains on Math and HumaEval | Abdul Kalam Azad | Sudhanshu Mittal |