他们访问的州对代理商的一般表征

论文标题

他们访问的州对代理商的一般表征

General Characterization of Agents by States they Visit

论文作者

Kanervisto, Anssi, Kinnunen, Tomi, Hautamäki, Ville

论文摘要

决策者或其政策的行为特征（BC）用于研究培训算法的结果，并作为算法本身的一部分，以鼓励独特的政策，匹配专家政策或每次更新的政策变化。但是，先前提出的解决方案通常不适用，要么是由于缺乏表达能力，计算限制或对策略或环境的约束。此外，许多BC都依赖政策的行动。我们讨论并演示这些BC如何误导，尤其是在随机环境中，并根据国家政策访问提出一种新颖的解决方案。我们进行了实验，以评估拟议BC对基准的质量，并评估其在研究培训算法，新颖性搜索和信任区域政策优化中的使用。该代码可从https://github.com/miffyli/policy-supervectors获得。

Behavioural characterizations (BCs) of decision-making agents, or their policies, are used to study outcomes of training algorithms and as part of the algorithms themselves to encourage unique policies, match expert policy or restrict changes to policy per update. However, previously presented solutions are not applicable in general, either due to lack of expressive power, computational constraint or constraints on the policy or environment. Furthermore, many BCs rely on the actions of policies. We discuss and demonstrate how these BCs can be misleading, especially in stochastic environments, and propose a novel solution based on what states policies visit. We run experiments to evaluate the quality of the proposed BC against baselines and evaluate their use in studying training algorithms, novelty search and trust-region policy optimization. The code is available at https://github.com/miffyli/policy-supervectors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题