Kogun：通过整合人类的次优知识来加速深度加强学习

论文标题

Kogun：通过整合人类的次优知识来加速深度加强学习

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

论文作者

Zhang, Peng, Hao, Jianye, Wang, Weixun, Tang, Hongyao, Ma, Yi, Duan, Yihai, Zheng, Yan

论文摘要

加强学习者通常会从头开始学习，这需要与环境进行大量相互作用。这与人类的学习过程完全不同。当面对一项新任务时，人自然会具有常识，并利用先验知识来得出初始政策并之后指导学习过程。尽管先前的知识可能不完全适用于新任务，但由于初始政策确保了学习和中间指导的快速启动，因此学习过程大大加速了，可以避免不必要的探索。借此灵感，我们提出了知识指导政策网络（Kogun），这是一个新颖的框架，将人类先前的次优知识与强化学习结合在一起。我们的框架由一个模糊的规则控制器组成，可以代表人类知识和一个精炼模块，以微调次级优先知识。所提出的框架是端到端的，可以与现有的基于策略的强化学习算法结合使用。我们对离散和连续控制任务进行实验。经验结果表明，我们的方法结合了人类的次优知识和RL，即使具有非常低的绩效人类的先验知识，扁平RL算法的学习效率也会显着提高。

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题