元提升学习中的超级奖项

论文标题

元提升学习中的超级奖项

Hypernetworks in Meta-Reinforcement Learning

论文作者

Beck, Jacob, Jackson, Matthew Thomas, Vuorio, Risto, Whiteson, Shimon

论文摘要

由于样本效率低下，培训对现实机器人技术任务的增强学习（RL）代理通常仍然不切实际。多任务RL和META-RL旨在通过概括相关任务的分布来提高样本效率。但是，在实践中这样做很困难：在多任务RL中，最新方法的状态通常无法超越简单的解决方案，该解决方案只是单独学习每个任务。超级核武器是一个有前途的途径，因为它们复制了退化解决方案的单独策略，同时还允许跨任务进行概括，并且适用于Meta-RL。但是，监督学习的证据表明，超网络性能对初始化高度敏感。在本文中，我们1）表明超网络初始化也是元rl的关键因素，而天真的初始化的性能较差。 2）提出了一种新型的超网络初始化方案，该方案匹配或超过针对监督设置提出的最先进方法的性能，并且更简单，更一般； 3）使用此方法表明，超网络可以通过评估多个模拟机器人基准来改善元RL的性能。

Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题