不要更改算法，更改数据：离线增强学习的探索性数据

论文标题

不要更改算法，更改数据：离线增强学习的探索性数据

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

论文作者

Yarats, Denis, Brandfonbrener, David, Liu, Hao, Laskin, Michael, Abbeel, Pieter, Lazaric, Alessandro, Pinto, Lerrel

论文摘要

深度学习的最新进展依赖于访问大型和多样化的数据集。在离线增强学习（RL）中，这种数据驱动的进展较少，因为通常收集离线RL数据以优化限制数据多样性的特定目标任务。在这项工作中，我们提出了离线RL（Exorl）的探索数据，这是一种以数据为中心的脱机RL的方法。 Exorl首先通过无监督的奖励探索生成数据，然后通过下游奖励重新标记此数据，然后再通过离线RL培训政策。我们发现，探索性数据允许香草非政策RL算法，而没有任何脱机特定的修改，可以在下游任务上胜过或匹配最先进的离线离线RL算法。我们的发现表明，数据生成与离线RL的算法进步一样重要，因此需要仔细考虑社区。代码和数据可以在https://github.com/denisyarats/exorl上找到。

Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL. We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks. Our findings suggest that data generation is as important as algorithmic advances for offline RL and hence requires careful consideration from the community. Code and data can be found at https://github.com/denisyarats/exorl .

下载PDF全文

下载文献需遵守相关版权规定

论文标题