学会与任何对手的混合在一起

论文标题

学会与任何对手的混合在一起

Learning to Play against Any Mixture of Opponents

论文作者

Smith, Max Olan, Anthony, Thomas, Wang, Yongzhao, Wellman, Michael P.

论文摘要

直观地，在给定领域中与对手的一种混合物的经验应与同一域中的其他混合物有关。我们提出了一种转移学习方法，即Q混合，首先是针对每个纯粹策略对手学习Q值。然后，通过适当平均单独学习的Q值来近似于对手策略的任何分布的Q值。从这些组件中，我们在没有任何进一步训练的情况下构建了针对所有对手混合物的政策。我们在两个环境中经验验证了Q混合：一个简单的网格世界足球环境和一个复杂的网络安全游戏。我们发现，Q混合能够成功地将知识转移到任何对手的混合物中。接下来，我们考虑在游戏过程中使用观察值，以更新对手的分布。我们介绍了一个对手分类器 - 使用相同的数据并行训练，并使用分类器结果来完善Q值的混合。我们发现，与直接针对混合构成对手的训练相比，对手分类器函数的增强Q混合功能相比具有较低的差异。

Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a complicated cyber-security game. We find that Q-Mixing is able to successfully transfer knowledge across any mixture of opponents. We next consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent classifier -- trained in parallel to Q-learning, using the same data -- and use the classifier results to refine the mixing of Q-values. We find that Q-Mixing augmented with the opponent classifier function performs comparably, and with lower variance, than training directly against a mixed-strategy opponent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题