论文标题
Tasil:泰勒系列模仿学习
TaSIL: Taylor Series Imitation Learning
论文作者
论文摘要
我们提出了泰勒系列模仿学习(TASIL),这是在连续控制的背景下对标准行为克隆损失的简单增强。 Tasil在学识渊博的和专家政策之间的高阶泰勒系列术语中惩罚了偏差。我们表明,满足$ \ textit {增量输入对国家稳定性} $概念的专家很容易学习,从某种意义上说,对于专家轨迹而言,小型的Tasil augnition Migitation损失保证了所学会政策产生的轨迹造成的小型模仿损失。我们为TASIL提供样本复合界限,以在可实现的设置中为$ \ tilde {\ Mathcal {o}}(1/n)$,用于$ n $专家演示的数量。最后,我们通过实验证明了专家政策的鲁棒性与TASIL所需的Taylor扩展顺序之间的关系,并将标准行为克隆,Dart和Dagger与Tasil-loss-loss-aign-Eaigment aughters进行了比较。在所有情况下,我们对各种穆约科省任务的基线都显示出显着改善。
We propose Taylor Series Imitation Learning (TaSIL), a simple augmentation to standard behavior cloning losses in the context of continuous control. TaSIL penalizes deviations in the higher-order Taylor series terms between the learned and expert policies. We show that experts satisfying a notion of $\textit{incremental input-to-state stability}$ are easy to learn, in the sense that a small TaSIL-augmented imitation loss over expert trajectories guarantees a small imitation loss over trajectories generated by the learned policy. We provide sample-complexity bounds for TaSIL that scale as $\tilde{\mathcal{O}}(1/n)$ in the realizable setting, for $n$ the number of expert demonstrations. Finally, we demonstrate experimentally the relationship between the robustness of the expert policy and the order of Taylor expansion required in TaSIL, and compare standard Behavior Cloning, DART, and DAgger with TaSIL-loss-augmented variants. In all cases, we show significant improvement over baselines across a variety of MuJoCo tasks.