作为多目标RL的内在探索

论文标题

作为多目标RL的内在探索

Intrinsic Exploration as Multi-Objective RL

论文作者

Morere, Philippe, Ramos, Fabio

论文摘要

内在动机使加强学习（RL）代理在奖励非常稀疏时探索，在玻璃体或电子怪事等传统探索启发式方面通常会失败。但是，通常以临时探索来处理内在的探索，其中探索不是被视为学习过程的核心目标。这种薄弱的配方会导致次优探索性能。为了克服这个问题，我们提出了一个基于多目标RL的框架，其中探索和开发都被优化为单独的目标。这种表述在政策层面上带来了勘探和剥削之间的平衡，从而使传统方法具有优势。这也允许在学习时控制探索，无需额外费用。这些策略获得了对代理探索的一定程度的控制，而这些策略以前在经典或内在的奖励方面无法实现。我们通过我们的框架提出方法（EMU-Q）来证明对连续状态行动空间的可扩展性，从而指导对高价值功能不确定性区域的探索。 EMU-Q在连续控制基准和机器人操作器上的实验表明表现出优于经典探索技术和其他固有的RL方法。

Intrinsic motivation enables reinforcement learning (RL) agents to explore when rewards are very sparse, where traditional exploration heuristics such as Boltzmann or e-greedy would typically fail. However, intrinsic exploration is generally handled in an ad-hoc manner, where exploration is not treated as a core objective of the learning process; this weak formulation leads to sub-optimal exploration performance. To overcome this problem, we propose a framework based on multi-objective RL where both exploration and exploitation are being optimized as separate objectives. This formulation brings the balance between exploration and exploitation at a policy level, resulting in advantages over traditional methods. This also allows for controlling exploration while learning, at no extra cost. Such strategies achieve a degree of control over agent exploration that was previously unattainable with classic or intrinsic rewards. We demonstrate scalability to continuous state-action spaces by presenting a method (EMU-Q) based on our framework, guiding exploration towards regions of higher value-function uncertainty. EMU-Q is experimentally shown to outperform classic exploration techniques and other intrinsic RL methods on a continuous control benchmark and on a robotic manipulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题