探索有效的深入强化学习，并进行机器人控制指南

论文标题

探索有效的深入强化学习，并进行机器人控制指南

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

论文作者

Lin, Ke, Gong, Liang, Li, Xudong, Sun, Te, Chen, Binhao, Liu, Chengliang, Zhang, Zhengfeng, Pu, Jian, Zhang, Junping

论文摘要

尽管深度加强学习（DRL）算法在许多控制任务中取得了重要成就，但它们仍然遭受样本效率低下和不稳定培训过程的问题，这些问题通常是由于稀疏的奖励引起的。最近，一些从演示方法（RLFD）方法中学习的一些强化方法已证明在克服这些问题方面有希望。但是，它们通常需要大量的示范。为了应对这些挑战，根据SAC算法，我们提出了样本有效的DRL-EG（具有有效指导的DRL）算法，在该算法中，歧视d（s）和指导者G（S）由少数专家示范进行建模。判别者将确定适当的指导状态，而指导者将指导代理在训练阶段更好地探索。几个连续控制任务的经验评估结果验证了我们方法比其他RL和RLFD对应物的有效性和性能提高。实验结果还表明，DRL-EG可以帮助代理商逃脱局部最佳。

Although deep reinforcement learning (DRL) algorithms have made important achievements in many control tasks, they still suffer from the problems of sample inefficiency and unstable training process, which are usually caused by sparse rewards. Recently, some reinforcement learning from demonstration (RLfD) methods have shown to be promising in overcoming these problems. However, they usually require considerable demonstrations. In order to tackle these challenges, on the basis of the SAC algorithm we propose a sample efficient DRL-EG (DRL with efficient guidance) algorithm, in which a discriminator D(s) and a guider G(s) are modeled by a small number of expert demonstrations. The discriminator will determine the appropriate guidance states and the guider will guide agents to better exploration in the training phase. Empirical evaluation results from several continuous control tasks verify the effectiveness and performance improvements of our method over other RL and RLfD counterparts. Experiments results also show that DRL-EG can help the agent to escape from a local optimum.

下载PDF全文

下载文献需遵守相关版权规定

论文标题