摩尔：基于模型的离线到线加强学习

论文标题

摩尔：基于模型的离线到线加强学习

MOORe: Model-based Offline-to-Online Reinforcement Learning

论文作者

Mao, Yihuan, Wang, Chao, Wang, Bin, Zhang, Chongjie

论文摘要

随着离线增强学习（RL）的成功，离线培训的RL政策有可能在网上部署时进一步改善。在安全的现实世界部署中，该政策的平稳转移至关重要。此外，该政策的快速改编在实际的在线绩效改进中起着至关重要的作用。为了应对这些挑战，我们提出了一种简单而有效的算法，基于模型的离线与在线加强学习（MOORE），该学习采用了优先的采样方案，该方案可以动态调整离线和在线数据，以使该策略的平滑在线适应。我们为算法设计提供了理论基础。 D4RL基准测试的实验结果表明，我们的算法从离线阶段顺利传输，同时可以实现样本有效的在线适应，并且也显着胜过现有方法。

With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online. A smooth transfer of the policy matters in safe real-world deployment. Besides, fast adaptation of the policy plays a vital role in practical online performance improvement. To tackle these challenges, we propose a simple yet efficient algorithm, Model-based Offline-to-Online Reinforcement learning (MOORe), which employs a prioritized sampling scheme that can dynamically adjust the offline and online data for smooth and efficient online adaptation of the policy. We provide a theoretical foundation for our algorithms design. Experiment results on the D4RL benchmark show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption, and also significantly outperforms existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题