论文标题

学习在多玩家零和游戏中解决联盟困境

Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games

论文作者

Hughes, Edward, Anthony, Thomas W., Eccles, Tom, Leibo, Joel Z., Balduzzi, David, Bachrach, Yoram

论文摘要

零和游戏长期以来具有人工智能研究,因为它们既拥有丰富的策略最佳策略空间,又具有清晰的评估指标。更重要的是,竞争是许多现实世界中能够产生智能创新的多机构系统的重要机制:达尔文进化论,市场经济和a​​lphazero算法,仅举几例。在两人零和零游戏中,挑战通常被视为寻找NASH均衡策略,无论对手如何,都可以防止剥削。尽管这捕捉到了国际象棋或棋子的复杂性,但它避免了与同事的合作概念,这是从单细胞生物到人类文明的主要过渡的标志。除了两个球员之外,联盟的成立经常具有优势。但是,这需要信任,即面对叛逆的激励措施,相互合作的承诺。因此,成功的游戏需要适应同事,而不是追求非探索性。在这里,我们认为对多玩家零和游戏的系统研究是人工智能研究的关键要素。使用对称的零和矩阵游戏,我们正式证明了联盟形成可能被视为社会困境,从经验上讲,幼稚的多代理强化学习因此无法形成联盟。我们介绍了一种经济竞争的玩具模型,并展示了如何使用点对点的合同机制来发现和执行联盟。最后,我们将代理模型概括为纳入时间扩展的合同,为进一步的工作提供了机会。

Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that naïve multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源