迭代剩余政策：用于目标的动态操纵可变形物体

论文标题

迭代剩余政策：用于目标的动态操纵可变形物体

Iterative Residual Policy: for Goal-Conditioned Dynamic Manipulation of Deformable Objects

论文作者

Chi, Cheng, Burchfiel, Benjamin, Cousineau, Eric, Feng, Siyuan, Song, Shuran

论文摘要

本文解决了对目标对象的动态操纵的任务。由于其复杂的动态（由对象变形和高速操作引入）和严格的任务要求（由精确的目标规范定义），因此此任务极具挑战性。为了应对这些挑战，我们提出了迭代剩余政策（IRP），这是一个适用于具有复杂动态的可重复任务的一般学习框架。 IRP通过Delta Dynamics学习了隐式策略 - 而不是对整个动力学系统进行建模并从该模型中推断动作，而是学习了Delta动力学，这些动力学可以预测Delta动作对以前观察到的轨迹的影响。结合自适应动作采样时，系统可以快速在线优化其动作以达到指定目标。我们证明了IRP对两项任务的有效性：鞭打绳索以撞到目标点并摆动布以达到目标姿势。尽管仅在固定机器人设置上进行了模拟训练，但IRP还是能够有效地概括为嘈杂的现实世界动态，具有看不见的物理属性的新对象以及不同的机器人硬件实施例，从而证明了相对于另类方法，其出色的概括能力。视频可从https://youtu.be/7h3sz3la-oa获得

This paper tackles the task of goal-conditioned dynamic manipulation of deformable objects. This task is highly challenging due to its complex dynamics (introduced by object deformation and high-speed action) and strict task requirements (defined by a precise goal specification). To address these challenges, we present Iterative Residual Policy (IRP), a general learning framework applicable to repeatable tasks with complex dynamics. IRP learns an implicit policy via delta dynamics -- instead of modeling the entire dynamical system and inferring actions from that model, IRP learns delta dynamics that predict the effects of delta action on the previously-observed trajectory. When combined with adaptive action sampling, the system can quickly optimize its actions online to reach a specified goal. We demonstrate the effectiveness of IRP on two tasks: whipping a rope to hit a target point and swinging a cloth to reach a target pose. Despite being trained only in simulation on a fixed robot setup, IRP is able to efficiently generalize to noisy real-world dynamics, new objects with unseen physical properties, and even different robot hardware embodiments, demonstrating its excellent generalization capability relative to alternative approaches. Video is available at https://youtu.be/7h3SZ3La-oA

下载PDF全文

下载文献需遵守相关版权规定

论文标题