计算零和多人游戏的零和协调的团队最大平衡

论文标题

计算零和多人游戏的零和协调的团队最大平衡

Computing Ex Ante Coordinated Team-Maxmin Equilibria in Zero-Sum Multiplayer Extensive-Form Games

论文作者

Zhang, Youzhi, An, Bo, Černý, Jakub

论文摘要

计算游戏理论在现代世界中都有许多应用在对抗情况下和社会利益的优化。尽管有许多用于在两人交互中计算解决方案的算法，但在多人游戏相互作用中找到最佳策略有效地仍然是一个开放的挑战。本文着重于用零和广泛形式游戏中的协调设备（TMecor）计算多人队最大值平衡。当一组球员对抗对手时，Tmecor对场景进行了建模。当一个团队共同努力以击败目标球员但禁止沟通时，可以在纸牌游戏（例如在桥梁和扑克中）找到这种情况；在现实世界中，在森林保护行动中，当协调群体在禁止非法伐木者中的接触有限时。现有的算法由于其高计算成本而难以有效地找到Tmecor。为了计算较大游戏中的Tmecor，我们做出以下主要贡献：（1）我们为团队提出了一个混合形式的战略代表，该战略代表保留了平衡集；（2）我们基于新颖的最佳响应甲骨文，在无限策略空间中介绍了一种具有保证的有限时间收敛的圆柱生成算法；（3）我们开发了一种相关的代表技术，以最佳响应甲骨文中的多线性项的精确表示；（4）我们通过实验表明，我们的算法比大型游戏中先前的最新算法快几个数量级。

Computational game theory has many applications in the modern world in both adversarial situations and the optimization of social good. While there exist many algorithms for computing solutions in two-player interactions, finding optimal strategies in multiplayer interactions efficiently remains an open challenge. This paper focuses on computing the multiplayer Team-Maxmin Equilibrium with Coordination device (TMECor) in zero-sum extensive-form games. TMECor models scenarios when a team of players coordinates ex ante against an adversary. Such situations can be found in card games (e.g., in Bridge and Poker), when a team works together to beat a target player but communication is prohibited; and also in real world, e.g., in forest-protection operations, when coordinated groups have limited contact during interdicting illegal loggers. The existing algorithms struggle to find a TMECor efficiently because of their high computational costs. To compute a TMECor in larger games, we make the following key contributions: (1) we propose a hybrid-form strategy representation for the team, which preserves the set of equilibria; (2) we introduce a column-generation algorithm with a guaranteed finite-time convergence in the infinite strategy space based on a novel best-response oracle; (3) we develop an associated-representation technique for the exact representation of the multilinear terms in the best-response oracle; and (4) we experimentally show that our algorithm is several orders of magnitude faster than prior state-of-the-art algorithms in large games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题