量化反馈频率在交互式增强学习中的效果

论文标题

量化反馈频率在交互式增强学习中的效果

Quantifying the Effect of Feedback Frequency in Interactive Reinforcement Learning for Robotic Tasks

论文作者

Harnack, Daniel, Pivin-Bachler, Julie, Navarro-Guerrero, Nicolás

论文摘要

强化学习（RL）已在机器人控制中广泛采用。尽管取得了许多成功，但一个主要的持续问题可能是数据效率很低。一种解决方案是交互式反馈，已证明可以大大加速RL。结果，有很多不同的策略，但是，这些策略主要是在离散的网格世界和小规模最佳控制方案上进行的。在文献中，关于哪种反馈频率是最佳的，或者当时反馈是最有益的。为了解决这些差异，我们分离并量化了反馈频率在具有连续状态和动作空间的机器人任务中的影响。这些实验涵盖了具有不同复杂性的机器人操纵臂的逆运动学学习。我们表明，看似矛盾的报道现象发生在不同的复杂程度下。此外，我们的结果表明不存在任何理想的反馈频率。相反，随着代理商在任务的熟练程度的提高，反馈频率应更改。

Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题