论文标题
用中性奖励功能解决对抗性模仿学习中的奖励偏见
Addressing reward bias in Adversarial Imitation Learning with neutral reward functions
论文作者
论文摘要
生成的对抗性模仿学习受到奖励偏见的基本问题,这是由于算法中使用的奖励功能的选择。不同类型的偏见还会影响不同类型的环境,这些环境大致分为生存和基于任务的环境。我们提供了一个理论草图,说明为什么在具有多个终端状态的基于任务的环境中,现有奖励功能在模仿学习方案中失败。我们还为盖尔(Gail)提出了一个新的奖励功能,该功能在具有单个终端状态和多个终端状态的基于任务的环境上胜过现有的盖尔方法,并有效地克服了生存和终止偏见。
Generative Adversarial Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm. Different types of biases also affect different types of environments - which are broadly divided into survival and task-based environments. We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states. We also propose a new reward function for GAIL which outperforms existing GAIL methods on task based environments with single and multiple terminal states and effectively overcomes both survival and termination bias.