Questions for the seminar Paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video"" ----------------------------------------------------------------------------------------------- Please send your answers to: 1) On a higher level: Why do we need some form of tracking? (~1 sentence) 2) Briefly describe how the Sinkhorn-Knopp algorithm works. (~2-3 sentences) 3) How does the number of attention heads relate to the number of tracked objects? (~1 sentences)