Home
Uni-Logo
 

Block-Seminar on Deep Learning

apl. Prof. Olaf Ronneberger (Google DeepMind)

In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language. The surprising emerging capabilities of large language models (like GPT-4) open up new design spaces. Many classic computer vision tasks can be translated into the language domain and can be (partially) solved there. Understanding the current capabilities, the shortcomings and approaches in the language domain will be essential for the future Computer Vision research. So the selected papers this year focus on the key concepts used in todays large language models as well as the approaches to combine computer vision with language.

For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.

The introduction meeting (together with Thomas Brox's seminar) will be in person, while the mid semester meeting will be online. The block seminar itself will be in person to give you the chance to practise your real-world presentation skills and to have more lively discussions


Contact person: Arian Mousakhan

Blockseminar:
(2 SWS)
In person.
Date: Thursday + Friday, 20. + 21. March, 9:30 - 15:30 in person

Room: SR 00-007 in building 106


Beginning: If you want to participate, attend the mandatory introduction meeting (Will be held jointly with Seminar on Current Works in Computer Vison) on October, 16th, 14:00, register in HisInOne, and submit your paper preferences before October, 21st.

Mid-Semester Lecture: (Monday, 20th January, 15.00 - 17.00, via video conference) Introduction to Generative models by apl. Prof. Olaf Ronneberger (Google DeepMind)

Recommended semester:

6 (Bachelor), any (Master)
Requirements: Background in computer vision

Remarks: This course is offered to both Bachelor and Master students. The language of this course is English. All presentations must be given in English.

Topics will be assigned for both seminars via a preference voting. If there are more interested students than places, first priority will be given to students who attended the intrdocution meeting. Afterwards, we follow the assignments of the HisInOne system. We want to avoid that people grab a topic and then jump off during the semester. Please have a coarse look at all available papers to make an informed decision before you commit. If you don't attend the meeting (or not send a paper preference) but choose this seminar together with only other overbooked seminars in HisInOne, you may end up without a seminar place this semester.

Students who just need to attend (failed SL from previous semester), need not send a preference for a paper, but just reply with "SL only".



   
Figure 1 from Movie Gen (Meta)
Videos at https://go.fb.me/MovieGenResearchVideos

Material

from Thomas Brox's seminar:

Papers:

Please send your answers to the questions to the corresponding advisor before the seminar day.

The seminar has space for 10 students

Thursday09:30B4The Llama 3 Herd of ModelsComplete description of the internals of LLama. Selected sections: 1; 2; 3.1; 3.2; 3.4; 4.1; 7; 9.2; 10Donat SinaniMax Argus
Thursday10:30B2Movie Gen: A Cast of Media Foundation ModelsImpressive video quality. Selected sections: 1; 2; 3 (without 3.1.6; 3.6.3; 3.6.4; 3.6.5; 3.7) 7.1; 7.2; 8. See also https://ai.meta.com/research/movie-genOmar SwelamRajat Sahay
Thursday11:30B1VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion ModelsClever pipeline using existing databases and model to build a text-to-3D modelUpamanyu DasLeonhard Sommer
Thursday12:30lunch break
Thursday13:30B3UniSim: Learning Interactive Real-World SimulatorsVideo generation beyond image quality, good applications and showing that they can improve embodiedAI planning capabilitiesNicolas von TrottKarim Farid
Thursday14:30B11Promptbreeder: Self-Referential Self-Improvement Via Prompt EvolutionSelf-improvement and refinement in prompts for LLMs, important feature for open-endednessUrs Micha SpiegelhalterJelena Bratulic
 
Friday09:30B10Visual SKETCHPAD: Sketching as a Visual Chain of Thought for Multimodal Language ModelsCool improvement for visual reasoning: Spend more compute on "thinking" during inference.Srikanth ShastrySimon Ging
Friday10:30B7Let's Verify Step by StepFiltering generated outputs with a process reward model (PRM) is much better than using an outcome reward model (ORM). Might be one of the ingredients in the OpenAI o1 model.Aishwarya DinniArtur Jesslen
Friday11:30B8Improve Mathematical Reasoning in Language Models by Automated Process SupervisionReplaces expensive human labels to train process reward models with automated labels.Ayisha Ryhana DawoodSimon Schrodi
Friday12:30lunch break
Friday13:30B9Training Language Models to Self-Correct via Reinforcement LearningSelf correction is an essential part of the OpenAI o1 model. The presented approach shows significant gains on Math and HumaEvalAbdul Kalam AzadSudhanshu Mittal