广泛相关的平衡的无重格学习动力

论文标题

广泛相关的平衡的无重格学习动力

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

论文作者

Celli, Andrea, Marchesi, Alberto, Farina, Gabriele, Gatti, Nicola

论文摘要

在正常游戏中，简单，未耦合的无regret动力学与相关的平衡是多代理系统理论的著名结果。具体而言，已经知道了20多年的众所周知，当所有玩家都试图在重复的正常游戏中最大程度地减少其内部遗憾时，游戏的经验频率会收敛到正常形式相关的平衡。广泛的形式（即树形）游戏通过对顺序和同时移动以及私人信息进行建模，从而概括了正常形式的游戏。由于游戏中部分信息的顺序性质和存在，广泛形式的相关性具有与正常形式的属性明显不同，而正常形式的相关性仍然是开放的研究方向。已经提出了广泛的形式相关平衡（EFCE）作为自然的广泛形式与正常形式相关的平衡。但是，目前尚不清楚EFCE是否是由于未耦合的代理动力学而出现的。在本文中，我们给出了第一个未耦合的无重格动力学，将$ n $ n $ - 玩家的通用和大型游戏收敛于EFCE，并带有完美的回忆。首先，我们在广泛的游戏中介绍了触发遗憾的概念，这扩展了正常游戏中的内部遗憾。当每个玩家的触发后悔低时，游戏的经验频率接近EFCE。然后，我们给出有效的无触发式算法。我们的算法在每个决策点为玩家的每个决策点将触发遗憾分解为本地子问题，并在每个决策点在本地解决方案中构建了玩家的全球策略。

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.

下载PDF全文

下载文献需遵守相关版权规定

论文标题