最后的卷聚力融合到颤抖的人的完美平衡

论文标题

最后的卷聚力融合到颤抖的人的完美平衡

Last-iterate Convergence to Trembling-hand Perfect Equilibria

论文作者

Bernasconi, Martino, Marchesi, Alberto, Trovò, Francesco

论文摘要

设计有效的算法以在顺序游戏中找到NASH平衡（NE）改进至关重要。确实，众所周知，NE有几个弱点，因为它可能会规定在游戏的那些部分中从未达到过平衡的那些部分。 NE改进，例如广泛的完美平衡（EFPE），通过考虑玩家的错误可能性来修改此类弱点。这在现实世界中的应用程序中至关重要，在现实世界中，通常涉及有限的理性玩家，事实证明，这对于促进超人人的表演也很有用。然而，只有很少的作品解决了计算NE改进的问题。他们中的大多数人建议通过线性编程来找到确切的NE精炼，因此，这些算法没有扩展到现实世界大小的游戏的潜力。另一方面，利用顺序游戏树结构的现有迭代算法仅提供融合保证近似改进。在本文中，我们提供了第一种有效的最后近题算法，该算法可证明在两人零和零连续游戏中收敛到EFPE，并具有不完美的信息。我们的算法通过跟踪一系列适当定义的正规化游戏的平衡来起作用。为此，它使用了一个量身定制的过程，该过程将近期归于此类游戏的平衡。至关重要的是，可以通过访问游戏树来有效执行此过程执行的更新，从而使我们的算法比基于线性的基于线性编程的竞争对手更可扩展。最后，我们在游戏的标准测试台上评估了我们的算法，表明它产生的策略对玩家的错误比最先进的NE-NE-Compant算法更强大。

Designing efficient algorithms to find Nash equilibrium (NE) refinements in sequential games is of paramount importance in practice. Indeed, it is well known that the NE has several weaknesses, since it may prescribe to play sub-optimal actions in those parts of the game that are never reached at the equilibrium. NE refinements, such as the extensive-form perfect equilibrium (EFPE), amend such weaknesses by accounting for the possibility of players' mistakes. This is crucial in real-world applications, where bounded rationality players are usually involved, and it turns out being useful also in boosting the performances of superhuman agents for recreational games like Poker. Nevertheless, only few works addressed the problem of computing NE refinements. Most of them propose algorithms finding exact NE refinements by means of linear programming, and, thus, these do not have the potential of scaling up to real-world-size games. On the other hand, existing iterative algorithms that exploit the tree structure of sequential games only provide convergence guarantees to approximate refinements. In this paper, we provide the first efficient last-iterate algorithm that provably converges to an EFPE in two-player zero-sum sequential games with imperfect information. Our algorithm works by tracking a sequence of equilibria of suitably-defined, regularized-perturbed games. In order to do that, it uses a procedure that is tailored to converge last-iterate to the equilibria of such games. Crucially, the updates performed by such a procedure can be performed efficiently by visiting the game tree, thus making our algorithm potentially more scalable than its linear-programming-based competitors. Finally, we evaluate our algorithm on a standard testbed of games, showing that it produces strategies which are much more robust to players' mistakes than those of state-of-the-art NE-computation algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题