学习固定的纳什均衡政策$ n $ - 玩家随机游戏，带有独立连锁店

论文标题

学习固定的纳什均衡政策$ n $ - 玩家随机游戏，带有独立连锁店

Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains

论文作者

Etesami, S. Rasoul

论文摘要

我们考虑了一个$ n $ - 玩家随机游戏的子类，其中玩家在通过收益功能耦合时拥有自己的内部状态/动作空间。假定玩家的内部链是由独立过渡概率驱动的。此外，玩家只能收到其回报的实现，而不是实际功能，并且无法观察彼此的状态/行动。对于这类游戏，我们首先表明找到固定的NASH平衡（NE）策略，而没有任何关于奖励功能的假设是可以相互作用的。但是，对于一般的奖励功能，我们基于双重平均和双镜下降而开发多项式学习算法，该算法几乎几乎肯定或预期的是$ε$ -ne政策的平均距离汇合到$ε$ -NE政策的集合。特别是，在诸如社会凹陷之类的奖励功能的额外假设下，我们在具有很高可能性的$ε$ -NN策略的迭代次数上得出了多项式上限。最后，我们评估了拟议算法在学习$ε$ -NN策略中使用数值实验用于智能电网中的能源管理的有效性。

We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can receive only realizations of their payoffs, not the actual functions, and cannot observe each other's states/actions. For this class of games, we first show that finding a stationary Nash equilibrium (NE) policy without any assumption on the reward functions is interactable. However, for general reward functions, we develop polynomial-time learning algorithms based on dual averaging and dual mirror descent, which converge in terms of the averaged Nikaido-Isoda distance to the set of $ε$-NE policies almost surely or in expectation. In particular, under extra assumptions on the reward functions such as social concavity, we derive polynomial upper bounds on the number of iterates to achieve an $ε$-NE policy with high probability. Finally, we evaluate the effectiveness of the proposed algorithms in learning $ε$-NE policies using numerical experiments for energy management in smart grids.

下载PDF全文

下载文献需遵守相关版权规定

论文标题