平均野外游戏的可扩展深钢筋学习算法

论文标题

平均野外游戏的可扩展深钢筋学习算法

Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

论文作者

Laurière, Mathieu, Perrin, Sarah, Girgin, Sertan, Muller, Paul, Jain, Ayush, Cabannes, Theophile, Piliouras, Georgios, Pérolat, Julien, Élie, Romuald, Pietquin, Olivier, Geist, Matthieu

论文摘要

已经引入了平均野外游戏（MFG），以有效地近似具有大量战略代理商的游戏。最近，MFG中学习平衡的问题已获得动力，尤其是使用无模型增强学习（RL）方法。使用RL进一步扩展的一个限制因素是，现有的算法解决MFG需要混合近似数量的策略或$ Q $值。在非线性函数近似的情况下，这远非富含良好的概括属性，例如神经网络。我们提出了两种解决此缺点的方法。第一个从历史数据蒸馏到神经网络的混合策略，将其应用于虚拟游戏算法。第二种是基于正规化的在线混合方法，不需要记住历史数据或以前的估计。它用于扩展在线镜像下降。我们以数字证明这些方法有效地可以使用深RL算法来求解各种MFG。此外，我们表明这些方法的表现优于文献中的SOTA基准。

Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题