技能：自适应技能测序，以进行有效的时间扩展探索

论文标题

技能：自适应技能测序，以进行有效的时间扩展探索

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

论文作者

Vezzani, Giulia, Tirumala, Dhruva, Wulfmeier, Markus, Rao, Dushyant, Abdolmaleki, Abbas, Moran, Ben, Haarnoja, Tuomas, Humplik, Jan, Hafner, Roland, Neunert, Michael, Fantacci, Claudio, Hertweck, Tim, Lampe, Thomas, Sadeghi, Fereshteh, Heess, Nicolas, Riedmiller, Martin

论文摘要

在建立一般和灵活的增强学习（RL）代理时，有效重复使用先验知识的能力是关键要求。技能再利用是最常见的方法之一，但是当前的方法具有相当大的局限性。例如，对现有政策进行微调经常失败，因为该政策在培训的早期可能会迅速降级。同样，当给予次优的专家时，专家行为的蒸馏会导致结果不佳。我们比较了多种在多个领域的技能转移的常见方法，包括任务和系统动力学的变化。我们确定现有方法如何失败，并引入一种减轻这些问题的替代方法。我们的方法学会了为探索的现有时间扩展的技能序列，但直接从原始体验中学习了最终政策。这种概念拆分可以快速适应，因此可以有效地收集数据，但没有限制最终解决方案。它的表现明显优于一系列评估任务的许多经典方法，我们使用大量的消融来突出我们方法的不同能力的重要性。

The ability to effectively reuse prior knowledge is a key requirement when building general and flexible Reinforcement Learning (RL) agents. Skill reuse is one of the most common approaches, but current methods have considerable limitations.For example, fine-tuning an existing policy frequently fails, as the policy can degrade rapidly early in training. In a similar vein, distillation of expert behavior can lead to poor results when given sub-optimal experts. We compare several common approaches for skill transfer on multiple domains including changes in task and system dynamics. We identify how existing methods can fail and introduce an alternative approach to mitigate these problems. Our approach learns to sequence existing temporally-extended skills for exploration but learns the final policy directly from the raw experience. This conceptual split enables rapid adaptation and thus efficient data collection but without constraining the final solution.It significantly outperforms many classical methods across a suite of evaluation tasks and we use a broad set of ablations to highlight the importance of differentc omponents of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题