论文标题
隐性运动学政策:端到端机器人学习中的统一关节和笛卡尔行动空间
Implicit Kinematic Policies: Unifying Joint and Cartesian Action Spaces in End-to-End Robot Learning
论文作者
论文摘要
动作表示是通过深网的端到端机器人学习中的一个重要但经常被忽视的方面。选择一个动作空间,而不是另一个动作空间(例如,目标关节位置或笛卡尔最终效应器姿势)可能会导致各种下游任务之间出人意料的表现差异 - 结果,已经进行了大量研究,致力于为给定应用找到合适的动作空间。但是,在这项工作中,我们研究了我们的模型如何发现和学习使用哪个动作空间。利用对隐式行为克隆的最新工作(以观察和行动为输入),我们证明可以在多个不同的空间中向相同的策略提出相同的动作 - 使其可以从每个空间学习归纳模式。具体来说,我们研究了在学习操纵技巧的背景下结合笛卡尔和联合行动空间的好处。为此,我们提出了隐式运动学策略(IKP),该策略将运动链纳入深网中的可区分模块。跨多个模拟连续控制任务进行的定量实验 - 从一堆小物体到用肘部抬起箱子,再到误校准的机器人的精确块插入 - 建议IKP不仅学习复杂的预感和非划和非划痕的像素,而且还可以比基线替代方案更好,而且还可以学习以补偿小关节额外的代码,以补偿小关节额外的代码。最后,我们还对实际UR5E进行定性实验,以证明使用真实数据的物理机器人系统上的算法可行性。有关代码和补充材料,请参见https://tinyurl.com/4wz3nf86。
Action representation is an important yet often overlooked aspect in end-to-end robot learning with deep networks. Choosing one action space over another (e.g. target joint positions, or Cartesian end-effector poses) can result in surprisingly stark performance differences between various downstream tasks -- and as a result, considerable research has been devoted to finding the right action space for a given application. However, in this work, we instead investigate how our models can discover and learn for themselves which action space to use. Leveraging recent work on implicit behavioral cloning, which takes both observations and actions as input, we demonstrate that it is possible to present the same action in multiple different spaces to the same policy -- allowing it to learn inductive patterns from each space. Specifically, we study the benefits of combining Cartesian and joint action spaces in the context of learning manipulation skills. To this end, we present Implicit Kinematic Policies (IKP), which incorporates the kinematic chain as a differentiable module within the deep network. Quantitative experiments across several simulated continuous control tasks -- from scooping piles of small objects, to lifting boxes with elbows, to precise block insertion with miscalibrated robots -- suggest IKP not only learns complex prehensile and non-prehensile manipulation from pixels better than baseline alternatives, but also can learn to compensate for small joint encoder offset errors. Finally, we also run qualitative experiments on a real UR5e to demonstrate the feasibility of our algorithm on a physical robotic system with real data. See https://tinyurl.com/4wz3nf86 for code and supplementary material.