使用Myerson价值

论文标题

使用Myerson价值

Towards a more efficient computation of individual attribute and policy contribution for post-hoc explanation of cooperative multi-agent systems using Myerson values

论文作者

Angelotti, Giorgio, Díaz-Rodríguez, Natalia

论文摘要

对团队中代理商的全球重要性的定量评估与战略家，决策者和体育教练一样有价值。但是，检索这些信息并不是很重要，因为在合作任务中，很难将个人的表现与整个团队中的一个隔离开来。此外，并不总是清楚代理人的角色与他的个人属性之间的关系。在这项工作中，我们认为Shapley分析的应用用于研究代理政策和属性的贡献，从而使它们处于平等的基础上。由于计算复杂性是NP-HARD，并且与可转让的公用事业游戏中的参与者数量成倍扩展，因此我们求助于利用有关游戏规则的A-Priori知识，以限制参与者之间的关系。因此，我们提出了一种确定代理系统策略和特征的分层知识图的方法。假设可以使用该系统的模拟器，则图结构允许利用动态编程以更快的方式评估重要性。我们测试了通过深入强化学习获得的硬编码政策和政策的案例证明环境中提出的方法。所提出的范例在计算上的要求少于计算莎普利值的微不足道，并且不仅可以很好地了解团队中代理商的重要性，而且还对最佳部署策略所需的属性也提供了很好的见解。

A quantitative assessment of the global importance of an agent in a team is as valuable as gold for strategists, decision-makers, and sports coaches. Yet, retrieving this information is not trivial since in a cooperative task it is hard to isolate the performance of an individual from the one of the whole team. Moreover, it is not always clear the relationship between the role of an agent and his personal attributes. In this work we conceive an application of the Shapley analysis for studying the contribution of both agent policies and attributes, putting them on equal footing. Since the computational complexity is NP-hard and scales exponentially with the number of participants in a transferable utility coalitional game, we resort to exploiting a-priori knowledge about the rules of the game to constrain the relations between the participants over a graph. We hence propose a method to determine a Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent System. Assuming a simulator of the system is available, the graph structure allows to exploit dynamic programming to assess the importances in a much faster way. We test the proposed approach in a proof-of-case environment deploying both hardcoded policies and policies obtained via Deep Reinforcement Learning. The proposed paradigm is less computationally demanding than trivially computing the Shapley values and provides great insight not only into the importance of an agent in a team but also into the attributes needed to deploy the policy at its best.

下载PDF全文

下载文献需遵守相关版权规定

论文标题