论文标题

迭代剩余政策:用于目标的动态操纵可变形物体

Iterative Residual Policy: for Goal-Conditioned Dynamic Manipulation of Deformable Objects

论文作者

Chi, Cheng, Burchfiel, Benjamin, Cousineau, Eric, Feng, Siyuan, Song, Shuran

论文摘要

本文解决了对目标对象的动态操纵的任务。由于其复杂的动态(由对象变形和高速操作引入)和严格的任务要求(由精确的目标规范定义),因此此任务极具挑战性。为了应对这些挑战,我们提出了迭代剩余政策(IRP),这是一个适用于具有复杂动态的可重复任务的一般学习框架。 IRP通过Delta Dynamics学习了隐式策略 - 而不是对整个动力学系统进行建模并从该模型中推断动作,而是学习了Delta动力学,这些动力学可以预测Delta动作对以前观察到的轨迹的影响。结合自适应动作采样时,系统可以快速在线优化其动作以达到指定目标。我们证明了IRP对两项任务的有效性:鞭打绳索以撞到目标点并摆动布以达到目标姿势。尽管仅在固定机器人设置上进行了模拟训练,但IRP还是能够有效地概括为嘈杂的现实世界动态,具有看不见的物理属性的新对象以及不同的机器人硬件实施例,从而证明了相对于另类方法,其出色的概括能力。视频可从https://youtu.be/7h3sz3la-oa获得

This paper tackles the task of goal-conditioned dynamic manipulation of deformable objects. This task is highly challenging due to its complex dynamics (introduced by object deformation and high-speed action) and strict task requirements (defined by a precise goal specification). To address these challenges, we present Iterative Residual Policy (IRP), a general learning framework applicable to repeatable tasks with complex dynamics. IRP learns an implicit policy via delta dynamics -- instead of modeling the entire dynamical system and inferring actions from that model, IRP learns delta dynamics that predict the effects of delta action on the previously-observed trajectory. When combined with adaptive action sampling, the system can quickly optimize its actions online to reach a specified goal. We demonstrate the effectiveness of IRP on two tasks: whipping a rope to hit a target point and swinging a cloth to reach a target pose. Despite being trained only in simulation on a fixed robot setup, IRP is able to efficiently generalize to noisy real-world dynamics, new objects with unseen physical properties, and even different robot hardware embodiments, demonstrating its excellent generalization capability relative to alternative approaches. Video is available at https://youtu.be/7h3SZ3La-oA

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源