通过颠倒的增强学习学习相对回报政策

论文标题

通过颠倒的增强学习学习相对回报政策

Learning Relative Return Policies With Upside-Down Reinforcement Learning

论文作者

Ashley, Dylan R., Arulkumaran, Kai, Schmidhuber, Jürgen, Srivastava, Rupesh Kumar

论文摘要

最近，人们对使用监督学习来解决强化学习问题引起了人们的兴趣。该领域的最新工作主要集中在学习命令条件政策上。我们研究了一种这样一种方法的潜力 - 倒置增强学习 - 与指定某些标量值和观察到的回报之间的所需关系的命令一起工作。我们表明，颠倒的强化学习可以学会在表格的强盗设置和Cartpole中在线上进行此类命令，并具有非线性功能近似。通过这样做，我们演示了这种方法家族的力量，并在更复杂的命令结构下为它们的实际使用开辟了道路。

Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems. Recent work in this area has largely focused on learning command-conditioned policies. We investigate the potential of one such method -- upside-down reinforcement learning -- to work with commands that specify a desired relationship between some scalar value and the observed return. We show that upside-down reinforcement learning can learn to carry out such commands online in a tabular bandit setting and in CartPole with non-linear function approximation. By doing so, we demonstrate the power of this family of methods and open the way for their practical use under more complicated command structures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题