Questions for Self-Labeling via simultaneous clustering and representation learning 1) How do the authors avoid a degenerate solution for simultaneous clustering and representation learning (i.e. putting all samples in one cluster)? (~2 sentences) 2) Describe the two steps of the algorithm. (~3-4 sentences, in your own words) 3) What are linear probes and why are they useful to assess the quality of learned representations? (~2 sentences)