Questions for "The Break-Even Point On Optimization Trajectories Of Deep Neural Networks"
-----------------------------------------------------------------------------------------------
Please send your answers to: marrakch@cs.uni-freiburg.de by 10:15 on 09.07.2020

1. How would the curvature of the loss surface and the covariance of gradients relate to the generalizations capabilities of a network? (2 sentences)

2. The authors report a lower variance of mini-batch gradients for lowers batch-sizes which is an unexpected behavior. Why is it unexpected? and why should the claim hold? (3 sentences)

3. According to the first conjucture, the maximum attained eigenvalues are smaller for larger learning rate or smaller batch size. Would you train a network with a very small batchsize (up to 4) and a large learning rate(~0.1)? Explain why (2-3 sentences)