论文标题
基于探针的干预措施,用于修改代理行为
Probe-Based Interventions for Modifying Agent Behavior
论文作者
论文摘要
神经网是功能强大的函数近似器,但是一旦受过训练的给定神经网的行为不能轻易修改。但是,我们希望人们能够影响神经代理人的行动,尽管代理人从未对人类进行训练,我们将其正式化为人类协助的决策问题。受到最初用于模型解释性的先前艺术的启发,我们开发了一种根据外部指定属性更新预训练的神经网中表示表示的方法。在实验中,我们展示了如何使用我们的方法来改善各种神经网络的人类代理团队绩效,从图像分类器到多代理增强学习环境中的代理。
Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks from image classifiers to agents in multi-agent reinforcement learning settings.