目标条件的强化学习最大化的事后期望最大化

论文标题

目标条件的强化学习最大化的事后期望最大化

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

论文作者

Tang, Yunhao, Kucukelbir, Alp

论文摘要

我们为目标条件的RL提出了一个图形模型框架，其EM算法在RL目标的下限上运行。 E-Step提供了一种自然的解释，即“事后学习”技术（例如她）如何处理非常稀疏的目标条件奖励。 M-Step将政策优化降低到监督的学习更新，该更新大大稳定了诸如图像之类的高维输入的端到端培训。我们表明，在具有稀疏奖励的广泛目标条件基准上，组合算法，下摆的合并算法显着优于模型基准。

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题