Questions for the seminar Paper "ComPhy: Compositional Physical Reasoning of Objects and Events from Videos"
-----------------------------------------------------------------------------------------------
Please send your answers to: schrodi@cs.uni-freiburg.de

1) What makes modeling physical objects/systems more difficult compared to modeling visual tasks? How is it ensured that the model uses "physical reasoning" in their setup? (~ 2-3 sentences)

2) In what components do they factorize their oracle model and how do these components work together (at a high level)? (~2-3 sentences)

3) What are the types of questions in the benchmark, what defines them, and how do they differ from each other? (~ 3 sentences)