掌握：基于梯度的计划选择

论文标题

掌握：基于梯度的计划选择

GrASP: Gradient-Based Affordance Selection for Planning

论文作者

Veeriah, Vivek, Zheng, Zeyu, Lewis, Richard, Singh, Satinder

论文摘要

通过学习模型的计划可以说是智力的关键组成部分。在大规模增强学习（RL）问题中，实现这种组成部分存在一些挑战。一个这样的挑战是在使用树搜索计划时有效地处理连续的动作空间（例如，即使在树的根节点上，也不可行地考虑每个动作）。在本文中，我们提出了一种用于计划有用的方法 - 学习在计划期间在树膨胀过程中考虑的少量操作/选项。我们认为，对于动作/选项以及无条件的负担能力，他们提供了目标和规定的映射，这些负担能够简单地选择所有州可用的动作/选项。我们的选择方法基于梯度：我们通过计划过程来计算梯度，以更新代表负担的函数的参数。我们的经验工作表明，学会选择原始行动和选项提供的可行，并且同时学习以学习的值等效模型选择负担能力和计划可以胜过无模型的RL。

Planning with a learned model is arguably a key component of intelligence. There are several challenges in realizing such a component in large-scale reinforcement learning (RL) problems. One such challenge is dealing effectively with continuous action spaces when using tree-search planning (e.g., it is not feasible to consider every action even at just the root node of the tree). In this paper we present a method for selecting affordances useful for planning -- for learning which small number of actions/options from a continuous space of actions/options to consider in the tree-expansion process during planning. We consider affordances that are goal-and-state-conditional mappings to actions/options as well as unconditional affordances that simply select actions/options available in all states. Our selection method is gradient based: we compute gradients through the planning procedure to update the parameters of the function that represents affordances. Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题