Questions for "Disentangling visual and written concepts in CLIP"
-----------------------------------------------------------------------------------------------
Please send your answers to: schroepp@cs.uni-freiburg.de by 14:00 on 08.02.2023.

1. What are the two important components of the CLIP model and how is CLIP trained? What is the problem of CLIP that is analyzed in the seminar paper (explain in your own words)? (3-4 sentences)

2. How are visual and written concepts in CLIP disentangled in the paper? Explain the data, "architecture" and loss. (2-3 sentences)

3. Evaluation is done qualitatively (Fig. 1 and Fig. 6) by guiding a generative model to generate images based on a text prompt. How does this work and how are the disentangled features used? What are the conclusions of the qualitative evaluation? (3 sentences)