Block-Seminar on Deep Learning
apl. Prof. Olaf Ronneberger (DeepMind)In this seminar you will learn about recent developments in deep learning with a focus on images and videos and their combination with other modalities like language. The surprising emerging capabilities of large language models (like GPT-4) open up new design spaces. Many classic computer vision tasks can be translated into the language domain and can be (partially) solved there. Understanding the current capabilities, the shortcomings and approaches in the language domain will be essential for the future Computer Vision research. So the selected papers this year focus on the key concepts used in todays large language models as well as the approaches to combine computer vision with language.
For each paper there will be one person, who performs a detailed investigation of a research paper and its background and will give a presentation (time limit is 35-40 minutes). The presentation is followed by a discussion with all participants about the merits and limitations of the respective paper. You will learn to read and understand contemporary research papers, to give a good oral presentation, to ask questions, and to openly discuss a research problem. The maximum number of students that can participate in the seminar is 10.
The introduction meeting (together with Thomas Brox's seminar) will be in person, while the mid semester meeting may be online. The block seminar itself will be in person to give you the chance to practise your real-world presentation skills and to have more lively discussions
Contact person: Silvio Galesso
|
GPT4(V)ision example |
Material
from Thomas Brox's seminar:
- Giving a good presentation
- Proper scientific behavior
- Powerpoint template for your presentation (optional)
QUESTIONS
Schedule:
Monday 4.3
Time | ID | Paper | Comment / project page | Student | Advisor |
9:30 | B6 | Towards In-context Scene Understanding | Yumna Ali | Sudhanshu Mittal | |
10:30 | B4 | Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Arkadyuti Kundu | Artur Jesslen | |
11:30 | B8 | Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction | https://lakonik.github.io/ssdnerf/ | Robin Textor-Falconi | Philipp Schroeppel |
12:30 | Lunch break | ||||
13:30 | B9 | DynIBaR: Neural Dynamic Image-Based Rendering | Abigail Durst | Philipp Schroeppel | |
14:30 | B10 | TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement | https://deepmind-tapir.github.io/ | Luca Pfrang | Johannes Dienert |
Tuesday 5.3
Time | ID | Paper | Comment / project page | Student | Advisor |
9:30 | B1 | LoRA: Low-Rank Adaptation of Large Language Models | Lukas Liemen | Leonhard Sommer | |
10:30 | B3 | Large Language Models Cannot Self-Correct Reasoning Yet | Anurag Garg | David Hoffmann | |
11:30 | B5 | Sigmoid Loss for Language Image Pre-Training | Indrashis Das | Simon Ging | |
12:30 | Lunch break | ||||
13:30 | B7 | Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | Tim Steinke | Silvio Galesso | |
14:30 | B2 | The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | (we will select parts and combine it with the GPT-4V(ision) system card | Redi Muharremi | Jelena Bratulic |