GA+DDPG+HE：基于遗传算法的功能优化器，用于用于机器人操纵任务的深钢筋学习

论文标题

GA+DDPG+HE：基于遗传算法的功能优化器，用于用于机器人操纵任务的深钢筋学习

GA+DDPG+HER: Genetic Algorithm-Based Function Optimizer in Deep Reinforcement Learning for Robotic Manipulation Tasks

论文作者

Sehgal, Adarsh, Ward, Nicholas, La, Hung Manh, Papachristos, Christos, Louis, Sushil

论文摘要

代理可以在奖励功能上使用加固学习（RL）做出决定。但是，学习算法参数的值的选择可以对整体学习过程产生重大影响。为了发现接近最佳的学习参数的值，我们在这项研究中扩展了以前提出的基于遗传算法的深层确定性策略梯度和事后的经验重播方法（称为GA+DDPG+她）。在Fetchreach，Fetchslide，Fetchpush，Fetchpick＆Place和Doropening的机器人操纵任务上，我们应用了GA+DDPG+她的方法。我们的技术GA+DDPG+她也在Auboreach环境中使用了一些调整。我们的实验分析表明，我们的方法产生的性能明显更好，并且比原始算法更快。我们还提供了GA+DDPG+她击败当前方法的证据。最终结果支持我们的主张，并提供了足够的证据，即自动化参数调整程序至关重要，并且确实减少了多达57％的学习时间。

Agents can base decisions made using reinforcement learning (RL) on a reward function. The selection of values for the learning algorithm parameters can, nevertheless, have a substantial impact on the overall learning process. In order to discover values for the learning parameters that are close to optimal, we extended our previously proposed genetic algorithm-based Deep Deterministic Policy Gradient and Hindsight Experience Replay approach (referred to as GA+DDPG+HER) in this study. On the robotic manipulation tasks of FetchReach, FetchSlide, FetchPush, FetchPick&Place, and DoorOpening, we applied the GA+DDPG+HER methodology. Our technique GA+DDPG+HER was also used in the AuboReach environment with a few adjustments. Our experimental analysis demonstrates that our method produces performance that is noticeably better and occurs faster than the original algorithm. We also offer proof that GA+DDPG+HER beat the current approaches. The final results support our assertion and offer sufficient proof that automating the parameter tuning procedure is crucial and does cut down learning time by as much as 57%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题