论文标题
您所需要的只是监督学习:从模仿学习到颠倒RL的Meta-RL
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
论文作者
论文摘要
颠倒的加固学习(UDRL)通过将回报作为输入和预测动作来翻转RL颠倒的目标函数中的返回的常规使用。 UDRL纯粹基于有监督的学习,并绕过RL中的一些突出问题:自举,校正校正和折现因素。尽管以前与UDRL的工作在传统的在线RL设置中展示了这一点,但在这里我们表明,该单个算法也可以在模仿学习和离线RL设置中工作,并将其扩展到目标条件条件的RL设置,甚至扩展到Meta-RL设置。借助一般代理体系结构,单个UDRL代理可以在所有范式上学习。
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL setting, here we show that this single algorithm can also work in the imitation learning and offline RL settings, be extended to the goal-conditioned RL setting, and even the meta-RL setting. With a general agent architecture, a single UDRL agent can learn across all paradigms.