在具有量化的回报观察的游戏中学习

论文标题

在具有量化的回报观察的游戏中学习

Learning in Games with Quantized Payoff Observations

论文作者

Lotidis, Kyriakos, Mertikopoulos, Panayotis, Bambos, Nicholas

论文摘要

本文调查了反馈量化对多项式学习的影响。特别是，当玩家只能观察到他们的回报的量化（可能是嘈杂）版本时，我们分析了众所周知的“关注正规领导者”（FTRL）算法类别的平衡收敛属性。在这种信息约束的环境中，我们表明较粗糙的量化会触发FTRL方案的收敛行为的质量转移。具体而言，如果量化误差位于阈值以下（仅取决于基础游戏，而不取决于输入过程的不确定性水平或所研究的特定FTRL变体），则（i）FTRL被游戏的严格NASH Equilibria所吸引，具有任意高的概率；（ii）该算法的渐近收敛速率与非量化情况相同。否则，对于较大的量化级别，这些收敛属性将完全丢失：玩家也可能无法学习最初状态以外的任何东西，即使有有关其回报媒介的完整信息。这与量化在连续优化问题中的影响相反，在连续优化问题中，所获得的溶液的质量随量化水平而顺利降解。

This paper investigates the impact of feedback quantization on multi-agent learning. In particular, we analyze the equilibrium convergence properties of the well-known "follow the regularized leader" (FTRL) class of algorithms when players can only observe a quantized (and possibly noisy) version of their payoffs. In this information-constrained setting, we show that coarser quantization triggers a qualitative shift in the convergence behavior of FTRL schemes. Specifically, if the quantization error lies below a threshold value (which depends only on the underlying game and not on the level of uncertainty entering the process or the specific FTRL variant under study), then (i) FTRL is attracted to the game's strict Nash equilibria with arbitrarily high probability; and (ii) the algorithm's asymptotic rate of convergence remains the same as in the non-quantized case. Otherwise, for larger quantization levels, these convergence properties are lost altogether: players may fail to learn anything beyond their initial state, even with full information on their payoff vectors. This is in contrast to the impact of quantization in continuous optimization problems, where the quality of the obtained solution degrades smoothly with the quantization level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题