Questions for "GroupViT: Semantic Segmentation Emerges from Text Supervision"
-----------------------------------------------------------------------------------------------
Please send your answers to: bechtolj@cs.uni-freiburg.de by 13:15 on 29.07.2022
Please be concise in your answers.
1. Name the benefits of training jointly on image-text data.
    
2. Describe the hierarchical grouping in your own words.
    
3. Which contrastive losses do they use and what's the intuition?