通过共同行动分散的多代理强化学习

论文标题

通过共同行动分散的多代理强化学习

Decentralized multi-agent reinforcement learning with shared actions

论文作者

Mishra, Rajesh K, Vasal, Deepanshu, Vishwanath, Sriram

论文摘要

在本文中，我们提出了一种新颖的无模型强化学习算法，以使用$ n $合作社的代理来计算多机构系统的最佳政策，每个代理商都会私下观察自己的私人类型，并公开观察彼此的行动。目标是最大程度地提高他们的集体奖励。该问题属于与部分信息的分散控制问题的广泛类别。我们使用普通代理的方法，其中一些虚拟的普通代理人基于对当前代理状态的信念选择最佳政策。这些信念从当前的信念和行动历史中为每个代理人单独更新。信念状态更新不知道系统动态是一个挑战。在本文中，我们采用称为Bootstrap过滤器的粒子过滤器，以更新信念。我们为这种多代理的马尔可夫决策过程提供了一种无模型的加固学习（RL）方法，并使用粒子滤镜和采样轨迹来估计代理的最佳策略。我们在SmartGrid应用程序的帮助下展示了我们的结果，在该应用程序中，用户努力减少网格中所有代理的集体电源成本。最后，我们比较了建立粒子滤波器（PF）方法有效性的RL算法的模型和无模型实现的性能。

In this paper, we propose a novel model-free reinforcement learning algorithm to compute the optimal policies for a multi-agent system with $N$ cooperative agents where each agent privately observes it's own private type and publicly observes each others' actions. The goal is to maximize their collective reward. The problem belongs to the broad class of decentralized control problems with partial information. We use the common agent approach wherein some fictitious common agent picks the best policy based on a belief on the current states of the agents. These beliefs are updated individually for each agent from their current belief and action histories. Belief state updates without the knowledge of system dynamics is a challenge. In this paper, we employ particle filters called the bootstrap filter distributively across agents to update the belief. We provide a model-free reinforcement learning (RL) method for this multi-agent partially observable Markov decision processes using the particle filter and sampled trajectories to estimate the optimal policies for the agents. We showcase our results with the help of a smartgrid application where the users strive to reduce collective cost of power for all the agents in the grid. Finally, we compare the performances for model and model-free implementation of the RL algorithm establishing the effectiveness of particle filter (pf) method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题