论文标题

元提升学习中的超级奖项

Hypernetworks in Meta-Reinforcement Learning

论文作者

Beck, Jacob, Jackson, Matthew Thomas, Vuorio, Risto, Whiteson, Shimon

论文摘要

由于样本效率低下,培训对现实机器人技术任务的增强学习(RL)代理通常仍然不切实际。多任务RL和META-RL旨在通过概括相关任务的分布来提高样本效率。但是,在实践中这样做很困难:在多任务RL中,最新方法的状态通常无法超越简单的解决方案,该解决方案只是单独学习每个任务。超级核武器是一个有前途的途径,因为它们复制了退化解决方案的单独策略,同时还允许跨任务进行概括,并且适用于Meta-RL。但是,监督学习的证据表明,超网络性能对初始化高度敏感。在本文中,我们1)表明超网络初始化也是元rl的关键因素,而天真的初始化的性能较差。 2)提出了一种新型的超网络初始化方案,该方案匹配或超过针对监督设置提出的最先进方法的性能,并且更简单,更一般; 3)使用此方法表明,超网络可以通过评估多个模拟机器人基准来改善元RL的性能。

Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源