论文标题
使用对象感知表示的多对象场景中的视觉运动控制
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations
论文作者
论文摘要
对场景及其不同组成部分之间关系的感知理解对于成功完成机器人任务很重要。表示学习已被证明是对此的强大技术,但是当前的大多数方法都学习了任务特定的表示,这些表示不一定会很好地转移到其他任务。此外,通过监督方法学到的表示形式需要大量的标记数据集,为每个在现实世界中收集昂贵的任务。使用自我监督的学习从未标记的数据中获取表示形式可以减轻此问题。但是,当前的自我监督表示方法学习方法主要是对象不可知的,我们证明所产生的表示不足以用于通用机器人技术任务,因为它们未能捕获与许多组件的场景的复杂性。在本文中,我们探讨了将对象感知表示的学习技术用于机器人任务的有效性。通过观察代理与环境的不同部分自由交互,并在两个不同的设置中查询:(i)策略学习和(ii)对象位置预测,可以从环境的不同部分自由互动来学习我们的自我监督表示。我们表明,我们的模型以样品有效的方式学习控制策略,并且胜过最先进的对象不可知技术以及对原始RGB图像训练的方法。我们的结果表明,使用隐式行为克隆(IBC),在策略培训中,低数据制度(1000个轨迹)的性能提高了20%。此外,我们的方法优于多对象场景中对象本地化的任务的基准。
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large labeled datasets for each task that are expensive to collect in the real world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we explore the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and is queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20 percent increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes.