论文标题
平均野外游戏和控制问题的统一加强Q学习
Unified Reinforcement Q-Learning for Mean Field Game and Control Problems
论文作者
论文摘要
我们提出了一种增强学习(RL)算法,以解决无限的渐近平均野外游戏(MFG)和平均野外控制(MFC)问题。我们的方法可以描述为统一的两次尺度平均字段Q学习:\ emph {Same}算法可以通过简单地调整两个学习参数的比率来学习MFG或MFC解决方案。该算法是在离散的时间和空间中,代理不仅为环境提供了诉讼,还为状态分布提供了一个分布,以考虑问题的平均字段特征。重要的是,我们假设代理人无法观察人口的分布,需要以无模型的方式估算它。渐近MFG和MFC问题也在连续的时间和空间中显示,并与经典的(非反应或固定)MFG和MFC问题相比。它们导致线性季度(LQ)情况中的显式解决方案,该解决方案用作我们算法结果的基准。
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The \emph{same} algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent can not observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.