#######################################################################
#                                                                     #
# Questions for                                                       #
# "Data-Dependent Initialization of Convolutional Neural Networks"    #
# by Philipp Kraehenbuehl, Carl Doersch, Jeff Donahue, Trevor Darrell #
# via arXiv:1511.06856v1                                              #
#                                                                     #
# Answers due ** June 8, 2016 at 10am** to mayern@cs.uni-freiburg.de  #
#                                                                     #
#######################################################################

1. The authors sample some input images for their data-dependent
   weight initialization procedure. Can you think of circumstances
   in which this might go wrong? Hint: Think about what assumptions
   are made about the data pool.
   (1-2 sentences)

2. Prove that the claim in the first paragraph on page 2 is correct,
   i.e. that changing weights and biases in the specified way does not
   change the computed function of a Convolution-ReLU-Convolution 
   network. Use 
     X_(i+1) := weights_i * X_i + biases_i
   as the function of convolutional layer i, and
     X_(i+1) := max(0, X_(i+1))
   for a ReLU after layer i (=you may ignore that ReLU's slope factor).
   (2-3 lines of math; no need to be formally perfect)

3. Does (2.) work if we use a sigmoid function instead of the ReLU?
   What if we use the square function (x^2)?
   (1 sentence)