论文标题

随机延迟的加强学习

Reinforcement Learning with Random Delays

论文作者

Ramstedt, Simon, Bouteiller, Yann, Beltrame, Giovanni, Pal, Christopher, Binas, Jonathan

论文摘要

在许多强化学习应用程序(例如遥控场景)中,通常会发生动作和观察延迟。我们研究了随机延迟环境的解剖结构,并表明事后重新采样轨迹片段可以实现非政策的多步骤估计。我们将此原理应用于得出延迟校正的参与者 - 批评(DCAC),这是一种基于软性参数批评的算法,其在有延迟的环境中的性能明显更好。理论上显示了这一点,并且在Mujoco连续控制基准的延迟调查版本上实际上也证明了这一点。

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源