零拍的“其他游戏”

论文标题

零拍的“其他游戏”

"Other-Play" for Zero-Shot Coordination

论文作者

Hu, Hengyuan, Lerer, Adam, Peysakhovich, Alex, Foerster, Jakob

论文摘要

我们考虑了零拍的问题 - 构建可以与他们以前从未见过的新型伴侣协调的AI代理（例如人类）。标准的多代理增强学习（MARL）方法通常集中在自我播放（SP）设置上，在该设置中，代理通过反复玩游戏来构建策略。不幸的是，将SP天真地应用于零射的协调问题上可能会产生代理，这些代理会建立高度专业的惯例，而这些惯例不会延续到未接受过培训的新型伙伴。我们介绍了一种称为其他游戏（OP）的新型学习算法，该算法通过寻找更健壮的策略来增强自我播放，从而利用了基本问题中已知的对称性的存在。我们在理论上和实验上都表征OP。我们研究了合作纸牌游戏Hanabi，并表明OP代理与受独立训练的代理人配对时获得更高的分数。在初步的结果中，我们还表明，与最先进的SP代理相比，与人类玩家配对时，我们的OP代理人获得了更高的平均得分。

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题