What four reinforcement learning methods were used and what are the parameters that are learned? What is the difference between on-policy and off-policy learning? Which of above methods are on-policy? The proposed parallel learning approaches do not use experience replay. What is the beneficial effect of experience replay? According to the authors, how a similar effect is obtained in the parallel framework?