Questions for "Emerging Properties in Self-Supervised Vision Transformers"

-----------------------------------------------------------------------------------------------

 

Please send your answers to: david.hoffmann2@de.bosch.com by 13:30 on 17.01.2022

 

    Question 1. What is the purpose of centering and sharpening. How do these two operations prevent the problem? (~2-3 sentences)

    Question 2. Point out 3 main differences to previous self-supervised learning methods like BYOL (~3 sentences)

    Question 3. What can be concluded from the results? Is ViT superior to ResNet for self-supervised training? If yes, for which tasks? If not, which component of DINO is responible for the good performance?(~2 sentences)