移动时的思考：并并发控制深度强化学习

论文标题

移动时的思考：并并发控制深度强化学习

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control

论文作者

Xiao, Ted, Jang, Eric, Kalashnikov, Dmitry, Levine, Sergey, Ibarz, Julian, Hausman, Karol, Herzog, Alexander

论文摘要

我们在设置中研究强化学习，在这些设置中，必须与受控系统的时间演变同时进行采样策略的采样，例如机器人必须在仍在执行先前动作的同时决定下一个操作时。就像一个人或动物一样，机器人必须同时思考和移动，并在上一个动作完成之前决定其下一个动作。为了为这种并发控制问题开发算法框架，我们从钟声方程的连续时间进行开始，然后以意识到系统延迟的方式将它们离散。我们通过简单的架构扩展到现有的基于价值的深度强化学习算法来实例化这种新的近似动态编程方法。我们评估了模拟基准任务和大规模机器人抓握任务的方法，在该任务中，机器人必须“在移动时进行思考”。

We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system, such as when a robot must decide on the next action while still performing the previous action. Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed. In order to develop an algorithmic framework for such concurrent control problems, we start with a continuous-time formulation of the Bellman equations, and then discretize them in a way that is aware of system delays. We instantiate this new class of approximate dynamic programming methods via a simple architectural extension to existing value-based deep reinforcement learning algorithms. We evaluate our methods on simulated benchmark tasks and a large-scale robotic grasping task where the robot must "think while moving".

下载PDF全文

下载文献需遵守相关版权规定

论文标题