论文标题
学习使用反事实推理进行交流
Learning to Communicate Using Counterfactual Reasoning
论文作者
论文摘要
学习进行交流以共享状态信息是多代理增强学习(MARL)领域的一个积极问题。信用分配问题,通信环境的非平稳性和可影响的代理的创建是该研究领域的主要挑战,需要克服,以学习有效的交流协议。本文介绍了新型的多代理反事实通信学习(MACC)方法,该方法适应了反事实推理,以克服交流代理的信用分配问题。其次,通过使用其他代理的动作策略和行动环境的Q功能来克服通信环境时的非平稳性Q-功能。此外,还引入了社会损失功能,以创建可影响有效的通信协议所需的可影响代理。我们的实验表明,在粒子环境中的四种不同情况下,MACC能够优于最先进的基线。
Learning to communicate in order to share state information is an active problem in the area of multi-agent reinforcement learning (MARL). The credit assignment problem, the non-stationarity of the communication environment and the creation of influenceable agents are major challenges within this research field which need to be overcome in order to learn a valid communication protocol. This paper introduces the novel multi-agent counterfactual communication learning (MACC) method which adapts counterfactual reasoning in order to overcome the credit assignment problem for communicating agents. Secondly, the non-stationarity of the communication environment while learning the communication Q-function is overcome by creating the communication Q-function using the action policy of the other agents and the Q-function of the action environment. Additionally, a social loss function is introduced in order to create influenceable agents which is required to learn a valid communication protocol. Our experiments show that MACC is able to outperform the state-of-the-art baselines in four different scenarios in the Particle environment.