部分可观察到的平均野外加固学习

论文标题

部分可观察到的平均野外加固学习

Partially Observable Mean Field Reinforcement Learning

论文作者

Subramanian, Sriram Ganapathi, Taylor, Matthew E., Crowley, Mark, Poupart, Pascal

论文摘要

传统的多代理增强学习算法对具有超过少数代理的环境无法扩展，因为这些算法在代理的数量中是指数的。最近的研究引入了成功的方法，将多代理增强学习算法扩展到许多代理方案，使用平均场理论。该领域的先前工作假设代理可以访问有关系统的平均字段行为的精确累积指标，然后可以将其用于采取其操作。在本文中，我们放松了这一假设并保持分布，以模拟有关系统平均场的不确定性。我们考虑此问题的两个不同的设置。在第一个环境中，只有固定邻域中的代理可见，而在第二种情况下，基于距离随机确定代理的可见性。对于这些设置中的每一种，我们引入了一个基于Q学习的算法，可以有效地学习。我们证明，对于第一个设置，此Q学习估计值非常接近NASH Q值（在共同的假设下）。我们还凭经验表明，我们的算法在Magents框架中的三个不同游戏中的表现优于多个基线，该框架支持大型环境，许多代理人同时学习，以实现可能不同的目标。

Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous work in this field assumes that an agent has access to exact cumulative metrics regarding the mean field behaviour of the system, which it can then use to take its actions. In this paper, we relax this assumption and maintain a distribution to model the uncertainty regarding the mean field of the system. We consider two different settings for this problem. In the first setting, only agents in a fixed neighbourhood are visible, while in the second setting, the visibility of agents is determined at random based on distances. For each of these settings, we introduce a Q-learning based algorithm that can learn effectively. We prove that this Q-learning estimate stays very close to the Nash Q-value (under a common set of assumptions) for the first setting. We also empirically show our algorithms outperform multiple baselines in three different games in the MAgents framework, which supports large environments with many agents learning simultaneously to achieve possibly distinct goals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题