样本工厂：以异步增强学习为100000 fps的像素的Egentric 3D控制

论文标题

样本工厂：以异步增强学习为100000 fps的像素的Egentric 3D控制

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

论文作者

Petrenko, Aleksei, Huang, Zhehui, Kumar, Tushar, Sukhatme, Gaurav, Koltun, Vladlen

论文摘要

增加了强化学习实验的规模，使研究人员能够在培训视频游戏的复杂代理以及机器人技术转移中取得前所未有的结果。通常，这样的实验依赖于大型分布式系统，需要昂贵的硬件设置，从而限制了对这一激动人心的研究领域的更广泛访问。在这项工作中，我们旨在通过优化增强学习算法的效率和资源利用来解决此问题，而不是依靠分布式计算。我们提出“样本工厂”，这是一种针对单机器设置优化的高通量训练系统。我们的体系结构结合了高效，异步，基于GPU的采样器与销售校正技术，从而使我们能够在3D中的非平凡控制问题上获得高于$ 10^5 $环境框架/秒，而无需牺牲样品效率。我们将样本工厂扩展到支持自我游戏和基于人群的培训，并将这些技术应用于培训功能高大的代理商进行多人游戏第一人称射击游戏。源代码可从https://github.com/alex-petrenko/sample-factory获得

Increasing the scale of reinforcement learning experiments has allowed researchers to achieve unprecedented results in both training sophisticated agents for video games, and in sim-to-real transfer for robotics. Typically such experiments rely on large distributed systems and require expensive hardware setups, limiting wider access to this exciting area of research. In this work we aim to solve this problem by optimizing the efficiency and resource utilization of reinforcement learning algorithms instead of relying on distributed computation. We present the "Sample Factory", a high-throughput training system optimized for a single-machine setting. Our architecture combines a highly efficient, asynchronous, GPU-based sampler with off-policy correction techniques, allowing us to achieve throughput higher than $10^5$ environment frames/second on non-trivial control problems in 3D without sacrificing sample efficiency. We extend Sample Factory to support self-play and population-based training and apply these techniques to train highly capable agents for a multiplayer first-person shooter game. The source code is available at https://github.com/alex-petrenko/sample-factory

下载PDF全文

下载文献需遵守相关版权规定

论文标题