PIXL2R：通过映射像素来奖励使用自然语言的指导加固学习

论文标题

PIXL2R：通过映射像素来奖励使用自然语言的指导加固学习

PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards

论文作者

Goyal, Prasoon, Niekum, Scott, Mooney, Raymond J.

论文摘要

加强学习（RL），尤其是在稀疏的奖励设置中，通常需要与环境的大量相互作用，从而将其适用于复杂问题。为了解决这个问题，几种先前的方法已经使用了自然语言来指导代理商的探索。但是，这些方法通常在环境的结构化表示上进行操作，并且/或在自然语言命令中采用某些结构。在这项工作中，我们提出了一个直接映射像素以奖励的模型，鉴于对任务的自由形式的自然语言描述，然后可以将其用于策略学习。我们对元世界机器人操纵域进行的实验表明，基于语言的奖励可显着提高策略学习的样本效率，无论是在稀疏和密集的奖励环境中。

Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy learning. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improves the sample efficiency of policy learning, both in sparse and dense reward settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题