学习基于记忆的人类大规模运动的控制

论文标题

学习基于记忆的人类大规模运动的控制

Learning Memory-Based Control for Human-Scale Bipedal Locomotion

论文作者

Siekmann, Jonah, Valluri, Srikar, Dao, Jeremy, Bermillo, Lorenzo, Duan, Helei, Fern, Alan, Hurst, Jonathan

论文摘要

控制非稳定稳定的双头是一个困难的问题，这在很大程度上是由于涉及的复杂混合动力学。最近的工作证明了增强学习（RL）对成功转移到真实双胞质的神经网络控制器的基于模拟的培训的有效性。但是，现有的工作主要使用了简单的无内存网络体系结构，即使更复杂的体系结构（包括内存）通常在其他RL域中产生卓越的性能。在这项工作中，我们考虑用于SIM到真实的双重运动的复发性神经网络（RNN），从而允许学习使用内部记忆来建模重要的物理属性的策略。我们表明，尽管RNN能够显着超过模拟中的无内存策略，但由于对模拟物理的过度拟合，除非使用动力学随机进行训练以防止过度拟合，否则它们在实际双头上没有表现出较高的行为。这会导致始终如一的SIM到实现传输。我们还表明，RNN可以使用他们的学习记忆状态来通过将动力学的参数编码为内存来执行在线系统识别。

Controlling a non-statically stable biped is a difficult problem largely due to the complex hybrid dynamics involved. Recent work has demonstrated the effectiveness of reinforcement learning (RL) for simulation-based training of neural network controllers that successfully transfer to real bipeds. The existing work, however, has primarily used simple memoryless network architectures, even though more sophisticated architectures, such as those including memory, often yield superior performance in other RL domains. In this work, we consider recurrent neural networks (RNNs) for sim-to-real biped locomotion, allowing for policies that learn to use internal memory to model important physical properties. We show that while RNNs are able to significantly outperform memoryless policies in simulation, they do not exhibit superior behavior on the real biped due to overfitting to the simulation physics unless trained using dynamics randomization to prevent overfitting; this leads to consistently better sim-to-real transfer. We also show that RNNs could use their learned memory states to perform online system identification by encoding parameters of the dynamics into memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题