Questions for "Cross-Modal and Hierarchical Modeling of Video and Text"
---------------------------------------------------------------------------
Please send your answers to zolfagha@cs.uni-freiburg.de by 24.07.2019 10am

1) Briefly explain the Hierarchical Sequence Modeling and describe the main reason of introducing it. (~ 2-3 sentences)

2) Based on eq.6 and eq.7, what would be the influence of contrastive loss on embeddings? (~ 1-2 sentences)

3) Does the Fig. 4 show effectivness of method on matching video and text embeddings? What technique you suggest to visualize the quality of learned embeddings?. (~ 2-3 sentences)