论文标题

摩尔:基于模型的离线到线加强学习

MOORe: Model-based Offline-to-Online Reinforcement Learning

论文作者

Mao, Yihuan, Wang, Chao, Wang, Bin, Zhang, Chongjie

论文摘要

随着离线增强学习(RL)的成功,离线培训的RL政策有可能在网上部署时进一步改善。在安全的现实世界部署中,该政策的平稳转移至关重要。此外,该政策的快速改编在实际的在线绩效改进中起着至关重要的作用。为了应对这些挑战,我们提出了一种简单而有效的算法,基于模型的离线与在线加强学习(MOORE),该学习采用了优先的采样方案,该方案可以动态调整离线和在线数据,以使该策略的平滑在线适应。我们为算法设计提供了理论基础。 D4RL基准测试的实验结果表明,我们的算法从离线阶段顺利传输,同时可以实现样本有效的在线适应,并且也显着胜过现有方法。

With the success of offline reinforcement learning (RL), offline trained RL policies have the potential to be further improved when deployed online. A smooth transfer of the policy matters in safe real-world deployment. Besides, fast adaptation of the policy plays a vital role in practical online performance improvement. To tackle these challenges, we propose a simple yet efficient algorithm, Model-based Offline-to-Online Reinforcement learning (MOORe), which employs a prioritized sampling scheme that can dynamically adjust the offline and online data for smooth and efficient online adaptation of the policy. We provide a theoretical foundation for our algorithms design. Experiment results on the D4RL benchmark show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption, and also significantly outperforms existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源