论文标题

通过基于后悔的环境设计发展课程

Evolving Curricula with Regret-Based Environment Design

论文作者

Parker-Holder, Jack, Jiang, Minqi, Dennis, Michael, Samvelyan, Mikayel, Foerster, Jakob, Grefenstette, Edward, Rocktäschel, Tim

论文摘要

通过加强学习(RL)培训一般有能力的代理商仍然是一个重大挑战。提高RL代理的鲁棒性的有希望的途径是通过使用课程。一种这样的方法将环境设计作为学生和老师之间的游戏,使用基于遗憾的目标在学生代理人能力的前沿产生环境实例(或级别)。这些方法受益于它们的普遍性,并具有平衡的理论保证,但是他们经常努力在具有挑战性的设计空间中找到有效的水平。相比之下,进化方法试图逐步改变环境复杂性,从而导致潜在的开放式学习,但通常依赖于特定领域的启发式方法和大量的计算资源。在本文中,我们建议在原则上基于遗憾的课程中利用进化的力量。我们称之为对抗性复杂性的方法通过编辑级别(ACCEL)试图在代理能力的边界不断产生水平,从而导致课程开始简单但变得越来越复杂。 Accel保持了先前基于后悔的方法的理论利益,同时在各种环境中提供了显着的经验收益。该论文的交互式版本可在Accelagent.github.io上获得。

It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at accelagent.github.io.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源