并发训练可改善观察的行为克隆的性能

论文标题

并发训练可改善观察的行为克隆的性能

Concurrent Training Improves the Performance of Behavioral Cloning from Observation

论文作者

Robertson, Zachary W., Walter, Matthew R.

论文摘要

从演示中学习被广泛用作机器人获取新技能的有效方法。但是，通常要求演示提供对状态和行动序列的完全访问。相比之下，从观察中学习提供了一种利用未标记的演示（例如，视频）进行模仿学习的方法。一种方法是通过观察（BCO）进行行为克隆。 BCO的原始实现是通过首先学习逆动力学模型，然后使用该模型来估算动作标签的实现，从而将问题减少到行为克隆。但是，现有的BCO方法需要在第一步中进行大量初始交互。在这里，我们提供了BCO的新理论分析，引入了修改BCO*，并表明在半监督的设置中，BCO*可以同时改善其对逆动力学模型和专家策略的估计。这一结果使我们能够消除对初始相互作用的依赖性，并显着提高BCO的样本复杂性。我们通过对各种基准域的实验来评估算法的有效性。结果表明，并发培训不仅可以改善BCO的性能，而且还会导致与最新的模仿学习方法（例如Gail和Value-Dice）具有竞争力的性能。

Learning from demonstration is widely used as an efficient way for robots to acquire new skills. However, it typically requires that demonstrations provide full access to the state and action sequences. In contrast, learning from observation offers a way to utilize unlabeled demonstrations (e.g., video) to perform imitation learning. One approach to this is behavioral cloning from observation (BCO). The original implementation of BCO proceeds by first learning an inverse dynamics model and then using that model to estimate action labels, thereby reducing the problem to behavioral cloning. However, existing approaches to BCO require a large number of initial interactions in the first step. Here, we provide a novel theoretical analysis of BCO, introduce a modification BCO*, and show that in the semi-supervised setting, BCO* can concurrently improve both its estimate for the inverse dynamics model and the expert policy. This result allows us to eliminate the dependence on initial interactions and dramatically improve the sample complexity of BCO. We evaluate the effectiveness of our algorithm through experiments on various benchmark domains. The results demonstrate that concurrent training not only improves over the performance of BCO but also results in performance that is competitive with state-of-the-art imitation learning methods such as GAIL and Value-Dice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题