R2-B2：基于递归推理的贝叶斯优化游戏中无重组学习

论文标题

R2-B2：基于递归推理的贝叶斯优化游戏中无重组学习

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

论文作者

Dai, Zhongxiang, Chen, Yizhou, Low, Kian Hsiang, Jaillet, Patrick, Ho, Teck-Hua

论文摘要

本文介绍了贝叶斯优化的递归推理形式主义（BO），以模拟有限理性的，自我利益的代理之间具有未知，复杂和昂贵的回复游戏中的互动之间的相互作用，我们将基于递归推理的重复效果（R2-b2）称为重复性游戏。我们的R2-B2算法是普遍的，因为它不限制不同代理的回报函数之间的关系，因此可以应用于各种类型的游戏，例如恒定-SUM，通用-SUM和Common-Payoff Games。我们证明，通过在第2级或更高的级别上进行推理，我们的R2-B2代理可以在不使用递归推理的情况下实现更快的渐近收敛速度。我们还提出了一种称为R2-B2-乘以来的R2-B2的计算较便宜变体，以较弱的收敛保证为代价。我们的R2-B2算法的性能和通用性是通过合成游戏，对抗机器学习和多代理增强学习的经验证明的。

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题