论文标题
注意选项批评
Attention Option-Critic
论文作者
论文摘要
增强学习中的时间抽象是代理人学习和使用高级行为的能力,称为选项。选项批判性体系结构提供了一种基于梯度的端到端学习方法来构建选项。我们提出了基于注意力的扩展到此框架,这使代理商能够学习将不同的选项集中在观测空间的不同方面。我们表明,这会导致行为上的多样化选项,这些选项也具有状态抽象,并防止了选项统治和频繁的期权切换的变性问题,同时实现了相似的样本复杂性。我们还通过不同的转移学习任务与期权批判性任务相比,我们还展示了学习期权的效率更高,可解释和可重复使用的性质。实验结果在相对简单的四居室环境和更复杂的啤酒(街机学习环境)中结果显示了我们方法的功效。
Temporal abstraction in reinforcement learning is the ability of an agent to learn and use high-level behaviors, called options. The option-critic architecture provides a gradient-based end-to-end learning method to construct options. We propose an attention-based extension to this framework, which enables the agent to learn to focus different options on different aspects of the observation space. We show that this leads to behaviorally diverse options which are also capable of state abstraction, and prevents the degeneracy problems of option domination and frequent option switching that occur in option-critic, while achieving a similar sample complexity. We also demonstrate the more efficient, interpretable, and reusable nature of the learned options in comparison with option-critic, through different transfer learning tasks. Experimental results in a relatively simple four-rooms environment and the more complex ALE (Arcade Learning Environment) showcase the efficacy of our approach.