机械搜索的等级政策学习

论文标题

机械搜索的等级政策学习

Hierarchical Policy Learning for Mechanical Search

论文作者

Zenkri, Oussama, Vien, Ngo Anh, Neumann, Gerhard

论文摘要

从夹子中检索对象是一项复杂的任务，它需要与环境进行多个交互，直到可以提取目标对象为止。这些互动涉及执行诸如抓地力或推动之类的动作原语，以及为对象操纵和执行的动作设定优先级。机械搜索（MS）是对象检索的框架，该框架使用启发式算法来推动和基于规则的算法进行高级计划。尽管基于规则的政策从人类的运作方式中获利，但在许多情况下，它们通常在次优。深度强化学习（RL）在复杂的任务中表现出了出色的表现，例如通过评估像素来做出决策，这使其适用于在对象回值的背景下进行培训。在这项工作中，我们首先将MS问题制定为层次POMDP。基于此公式，我们为MS问题提出了一种层次结构学习方法。为了演示，我们提出了两个主要的参数化子领域：推动策略和行动选择策略。当整合到层次POMDP的策略中时，我们提出的子托管将检索目标对象从少于32％提高到近80％的成功率，同时将推动行动的计算时间从多秒钟减少到小于10毫秒。

Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like grasping or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) is a framework for object retrieval, which uses a heuristic algorithm for pushing and rule-based algorithms for high-level planning. While rule-based policies profit from human intuition in how they work, they usually perform sub-optimally in many cases. Deep reinforcement learning (RL) has shown great performance in complex tasks such as taking decisions through evaluating pixels, which makes it suitable for training policies in the context of object-retrieval. In this work, we first formulate the MS problem in a principled formulation as a hierarchical POMDP. Based on this formulation, we propose a hierarchical policy learning approach for the MS problem. For demonstration, we present two main parameterized sub-policies: a push policy and an action selection policy. When integrated into the hierarchical POMDP's policy, our proposed sub-policies increase the success rate of retrieving the target object from less than 32% to nearly 80%, while reducing the computation time for push actions from multiple seconds to less than 10 milliseconds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题