经验增强：提高和加速违反政策多代理增强学习

论文标题

经验增强：提高和加速违反政策多代理增强学习

Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

论文作者

Ye, Zhenhui, Chen, Yining, Song, Guanghua, Yang, Bowei, Fan, Shen

论文摘要

探索高维状态行动空间是强化学习（RL）的最大挑战之一，尤其是在多机构领域。我们提出了一种名为“体验增强”的新技术，该技术可以基于对环境的快速，公正和彻底的探索，使时间效率且提高学习。它可以与任意的非政策非政策MARL算法结合使用，并且适用于均质或异质环境。我们通过将其与MADDPG结合并在两个均质和一个异质环境中验证性能来证明我们的方法。在表现最佳的情况下，MADDPG具有经验增强的MADDPG以1/4逼真的时间获得了Vanilla Maddpg的融合奖励，并且其融合以很大的利润击败了原始模型。我们的消融研究表明，体验增强是一种至关重要的成分，可以加速训练过程并提高收敛性。

Exploration of the high-dimensional state action space is one of the biggest challenges in Reinforcement Learning (RL), especially in multi-agent domain. We present a novel technique called Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. It can be combined with arbitrary off-policy MARL algorithms and is applicable to either homogeneous or heterogeneous environments. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments. In the best performing scenario, the MADDPG with experience augmentation reaches to the convergence reward of vanilla MADDPG with 1/4 realistic time, and its convergence beats the original model by a significant margin. Our ablation studies show that experience augmentation is a crucial ingredient which accelerates the training process and boosts the convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题