论文标题
使用模型参数的元强化学习
Meta-Reinforcement Learning Using Model Parameters
论文作者
论文摘要
在元提升学习中,在多种不同的环境中训练了一个代理商,并试图学习可以有效适应新环境的元元素。本文使用模型参数介绍了坡道,这是一种增强型学习代理,该参数利用了一个训练的神经网络预测环境动态的想法,可以封装环境信息。坡道分为两个阶段:在第一阶段,学习了多种环境的动态模型。在第二阶段中,动态模型的模型参数用作无模型增强学习代理的多环境策略的上下文。
In meta-reinforcement learning, an agent is trained in multiple different environments and attempts to learn a meta-policy that can efficiently adapt to a new environment. This paper presents RAMP, a Reinforcement learning Agent using Model Parameters that utilizes the idea that a neural network trained to predict environment dynamics encapsulates the environment information. RAMP is constructed in two phases: in the first phase, a multi-environment parameterized dynamic model is learned. In the second phase, the model parameters of the dynamic model are used as context for the multi-environment policy of the model-free reinforcement learning agent.