通过多次大规模重播的持续加强学习

论文标题

通过多次大规模重播的持续加强学习

Continual Reinforcement Learning with Multi-Timescale Replay

论文作者

Kaplanis, Christos, Clopath, Claudia, Shanahan, Murray

论文摘要

在本文中，我们提出了一个多时间尺度重播（MTR）缓冲液，以改善面对环境的RL持续学习，这些环境在代理商未知的时间尺度上随着时间的推移不断变化。基本的MTR缓冲液组成了一系列级别缓冲器，它们在不同的时间范围内积累了体验，从而使代理商能够改善适应新数据和保留旧知识之间的权衡。我们还将MTR框架与不变的风险最小化结合在一起，并鼓励代理商学习在随着时间的推移遇到的各种环境中稳健的政策。在两个连续的控制任务上，在三种不同的连续学习设置中评估了MTR方法，并且在许多情况下，对基线的方法显示出改善。

In this paper, we propose a multi-timescale replay (MTR) buffer for improving continual learning in RL agents faced with environments that are changing continuously over time at timescales that are unknown to the agent. The basic MTR buffer comprises a cascade of sub-buffers that accumulate experiences at different timescales, enabling the agent to improve the trade-off between adaptation to new data and retention of old knowledge. We also combine the MTR framework with invariant risk minimization, with the idea of encouraging the agent to learn a policy that is robust across the various environments it encounters over time. The MTR methods are evaluated in three different continual learning settings on two continuous control tasks and, in many cases, show improvement over the baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题