Questions for "Training Compute-Optimal Large Language Models" ----------------------------------------------------------------------------------------------- Please send your answers to: dienertj@cs.uni-freiburg.de by 13:30 on 14.06.2023 Question 1. How does the suggested scaling law differ to previous work? (~1-2 sentences) Question 2. What is the estimated optimal number of training tokens for a model with the size of Gopher (280B parameters)? (1 sentence, numbers from one approach are sufficient) Question 3. On which benchmarks train/test set leakage is less of a concern? On which more? (~2-3 sentences)