PMIC：通过渐进式信息协作改善多代理增强学习

论文标题

PMIC：通过渐进式信息协作改善多代理增强学习

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

论文作者

Li, Pengyi, Tang, Hongyao, Yang, Tianpei, Hao, Xiaotian, Sang, Tong, Zheng, Yan, Hao, Jianye, Taylor, Matthew E., Tao, Wenyuan, Wang, Zhen, Barez, Fazl

论文摘要

学习协作对于多代理增强学习（MARL）至关重要。先前的作品通过最大化代理行为的相关性来促进协作，该行为的相关性通常以不同形式的相互信息（MI）为特征。但是，我们揭示了次优的协作行为也具有很强的相关性，并且简单地最大程度地提高了MI可以阻碍学习的学习。为了解决这个问题，我们提出了一个新颖的MARL框架，称为“渐进式信息协作（PMIC）”，以进行更有效的MI驱动协作。 PMIC使用全球国家和联合行动之间MI测量的新协作标准。基于此标准，PMIC的关键思想是最大化与优越的协作行为相关的MI，并最大程度地减少与下等方面相关的MI。这两个MI目标通过促进更好的合作，同时避免陷入次级优势的过程，从而扮演着互补的角色。与其他算法相比，在各种MAL基准测试的实验表明，PMIC的表现出色。

Learning to collaborate is critical in Multi-Agent Reinforcement Learning (MARL). Previous works promote collaboration by maximizing the correlation of agents' behaviors, which is typically characterized by Mutual Information (MI) in different forms. However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. PMIC uses a new collaboration criterion measured by the MI between global states and joint actions. Based on this criterion, the key idea of PMIC is maximizing the MI associated with superior collaborative behaviors and minimizing the MI associated with inferior ones. The two MI objectives play complementary roles by facilitating better collaborations while avoiding falling into sub-optimal ones. Experiments on a wide range of MARL benchmarks show the superior performance of PMIC compared with other algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题