Grimgep：在视觉深度强化学习中学习稳健目标采样的学习进度

论文标题

Grimgep：在视觉深度强化学习中学习稳健目标采样的学习进度

GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep Reinforcement Learning

论文作者

Kovač, Grgur, Laversanne-Finot, Adrien, Oudeyer, Pierre-Yves

论文摘要

为了增加强化学习范围，设计代理，能够自主学习广泛的技能至关重要。这既可以提高学习技能的多样性，又可以减轻为每种技能设计奖励功能的负担。自我监督的代理商，设定自己的目标并试图最大化这些目标的多样性，这对这一目标表现出了巨大的希望。但是，目前已知的试图最大化采样目标多样性的代理商的限制是，它们倾向于被噪声吸引，或者更普遍地吸引到无法控制的环境（分心器）的一部分。当代理可以访问预定义的目标特征或专家知识时，绝对学习进度（ALP）提供了一种区分可以控制的区域和无法控制的区域的方法。但是，当仅提供原始感觉输入（例如图像）时，这些方法通常会缺乏。在这项工作中，我们将这些概念扩展到无监督的基于图像的目标探索。我们提出了一个框架，该框架允许代理商自主识别和忽略嘈杂的分散注意力区域，同时在可学习区域寻找新颖性，以提高整体表现并避免灾难性的遗忘。我们的框架可以与任何最先进的新颖性寻求目标探索方法结合使用。我们构建了一个带有干扰器的基于3D图像的环境。在此环境上进行的实验表明，使用我们的框架的代理成功地识别了环境的有趣区域，从而极大地改善了性能。源代码可从https://sites.google.com/view/grimgep获得。

Designing agents, capable of learning autonomously a wide range of skills is critical in order to increase the scope of reinforcement learning. It will both increase the diversity of learned skills and reduce the burden of manually designing reward functions for each skill. Self-supervised agents, setting their own goals, and trying to maximize the diversity of those goals have shown great promise towards this end. However, a currently known limitation of agents trying to maximize the diversity of sampled goals is that they tend to get attracted to noise or more generally to parts of the environments that cannot be controlled (distractors). When agents have access to predefined goal features or expert knowledge, absolute Learning Progress (ALP) provides a way to distinguish between regions that can be controlled and those that cannot. However, those methods often fall short when the agents are only provided with raw sensory inputs such as images. In this work we extend those concepts to unsupervised image-based goal exploration. We propose a framework that allows agents to autonomously identify and ignore noisy distracting regions while searching for novelty in the learnable regions to both improve overall performance and avoid catastrophic forgetting. Our framework can be combined with any state-of-the-art novelty seeking goal exploration approaches. We construct a rich 3D image based environment with distractors. Experiments on this environment show that agents using our framework successfully identify interesting regions of the environment, resulting in drastically improved performances. The source code is available at https://sites.google.com/view/grimgep.

下载PDF全文

下载文献需遵守相关版权规定

论文标题