论文标题

重新访问经验重播的基本原理

Revisiting Fundamentals of Experience Replay

论文作者

Fedus, William, Ramachandran, Prajit, Agarwal, Rishabh, Bengio, Yoshua, Larochelle, Hugo, Rowland, Mark, Dabney, Will

论文摘要

经验重播是深入增强学习(RL)中违规算法的核心,但是我们的理解仍然存在很大的差距。因此,我们对Q学习方法的经验重播进行了系统的广泛分析,重点介绍了两个基本属性:重播能力和学习更新与收集的经验(重播比率)的比率。我们的添加剂和消融研究颠覆了围绕经验重播的传统观念 - 发现更大的能力大大提高了某些算法的性能,同时使其他算法不受影响。违反直觉,我们证明理论上未接地的,未纠正的n步回报是独特的有益的,而其他技术则赋予了有限的益处,从而筛选出更大的记忆力。另外,通过直接控制重播比率,我们将文献中先前的观察结果进行环境化,并从经验上衡量其在各种深度RL算法中的重要性。最后,我们通过检验了有关这些绩效益处的性质的一组假设来得出结论。

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源