Questions for the seminar Paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video""
-----------------------------------------------------------------------------------------------
Please send your answers to: dienertj@cs.uni-freiburg.de

1) On a higher level: Why do we need some form of tracking? (~1 sentence)
2) Briefly describe how the Sinkhorn-Knopp algorithm works. (~2-3 sentences)
3) How does the number of attention heads relate to the number of tracked objects? (~1 sentences)