随机延迟的加强学习

论文标题

随机延迟的加强学习

Reinforcement Learning with Random Delays

论文作者

Ramstedt, Simon, Bouteiller, Yann, Beltrame, Giovanni, Pal, Christopher, Binas, Jonathan

论文摘要

在许多强化学习应用程序（例如遥控场景）中，通常会发生动作和观察延迟。我们研究了随机延迟环境的解剖结构，并表明事后重新采样轨迹片段可以实现非政策的多步骤估计。我们将此原理应用于得出延迟校正的参与者 - 批评（DCAC），这是一种基于软性参数批评的算法，其在有延迟的环境中的性能明显更好。理论上显示了这一点，并且在Mujoco连续控制基准的延迟调查版本上实际上也证明了这一点。

Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题