Questions for "Training Compute-Optimal Large Language Models"
-----------------------------------------------------------------------------------------------

Please send your answers to: dienertj@cs.uni-freiburg.de by 13:30 on 14.06.2023

    Question 1. How does the suggested scaling law differ to previous work? (~1-2 sentences)
    Question 2. What is the estimated optimal number of training tokens for a model with the size of Gopher (280B parameters)? (1 sentence, numbers from one approach are sufficient)
    Question 3. On which benchmarks train/test set leakage is less of a concern? On which more? (~2-3 sentences)