论文标题
国家条件的对抗亚目标
State-Conditioned Adversarial Subgoal Generation
论文作者
论文摘要
分层增强学习(HRL)建议通过在较高的时间抽象水平上执行决策和控制来解决困难的任务。但是,由于低级政策在不断变化,因此非政策的HRL经常遭受非平稳高级政策的问题。在本文中,我们提出了一种新型的HRL方法,用于通过对抗性实施高级政策来生成与低级政策的当前实例化相吻合,来缓解非平稳性。实际上,通过训练一个简单的国家条件歧视者网络与高级策略同时培训,该网络可以实现对抗性学习。与最先进的算法相比,我们的方法在挑战连续的控制任务中提高了学习效率和表现。
Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy HRL often suffers from the problem of a non-stationary high-level policy since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning is implemented by training a simple state-conditioned discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Comparison to state-of-the-art algorithms shows that our approach improves both learning efficiency and performance in challenging continuous control tasks.