元强化学习，用于腿部机器人的最佳设计

论文标题

元强化学习，用于腿部机器人的最佳设计

Meta Reinforcement Learning for Optimal Design of Legged Robots

论文作者

Belmonte-Baeza, Álvaro, Lee, Joonho, Valsecchi, Giorgio, Hutter, Marco

论文摘要

机器人设计的过程是一项复杂的任务，大多数设计决策仍然基于人类直觉或乏味的手动调整。面对这项任务的一种更明智的方法是计算设计方法，其中设计参数与相应的控制器同时优化。但是，现有的方法受到预定义的控制规则或运动模板的强烈影响，无法提供端到端的解决方案。在本文中，我们使用无模型的元加强学习介绍了设计优化框架，及其应用于四倍体机器人的运动学和执行参数。我们使用元加强学习来培训可以快速适应不同设计的运动政策。该策略用于在设计优化期间评估每个设计实例。我们证明该策略可以控制不同设计的机器人，以跟踪各种粗糙地形上的随机速度命令。通过受控实验，我们表明，适应后的每个设计实例的元策略可实现近距离的性能。最后，我们将结果与基于模型的基线进行比较，并表明我们的方法允许更高的性能，而不会受到预定义的运动或步态模式的约束。

The process of robot design is a complex task and the majority of design decisions are still based on human intuition or tedious manual tuning. A more informed way of facing this task is computational design methods where design parameters are concurrently optimized with corresponding controllers. Existing approaches, however, are strongly influenced by predefined control rules or motion templates and cannot provide end-to-end solutions. In this paper, we present a design optimization framework using model-free meta reinforcement learning, and its application to the optimizing kinematics and actuator parameters of quadrupedal robots. We use meta reinforcement learning to train a locomotion policy that can quickly adapt to different designs. This policy is used to evaluate each design instance during the design optimization. We demonstrate that the policy can control robots of different designs to track random velocity commands over various rough terrains. With controlled experiments, we show that the meta policy achieves close-to-optimal performance for each design instance after adaptation. Lastly, we compare our results against a model-based baseline and show that our approach allows higher performance while not being constrained by predefined motions or gait patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题