使用Wassertein分布在鲁棒的深Q学习的路径计划

论文标题

使用Wassertein分布在鲁棒的深Q学习的路径计划

Path Planning Using Wassertein Distributionally Robust Deep Q-learning

论文作者

Alpturk, Cem, Renganathan, Venkatraman

论文摘要

我们使用深入的强化学习和分配强大的优化观点调查了风险避开机器人路径的问题。我们的问题公式涉及将机器人建模为随机线性动力学系统，假设可以使用过程噪声样本的集合。我们将风险避开运动计划问题作为马尔可夫决策过程，并提出了一种连续的奖励功能设计，该设计明确考虑了与障碍物发生碰撞的风险，同时鼓励机器人对目标的动议。我们通过Lipschitz近似wasserstein分配了稳健的深度Q学习，从而对噪声不确定性进行了对冲。博学的控制动作导致从源到目标的安全和风险避开轨迹，避免了所有障碍。提出了各种支持的数值模拟，以证明我们提出的方法。

We investigate the problem of risk averse robot path planning using the deep reinforcement learning and distributionally robust optimization perspectives. Our problem formulation involves modelling the robot as a stochastic linear dynamical system, assuming that a collection of process noise samples is available. We cast the risk averse motion planning problem as a Markov decision process and propose a continuous reward function design that explicitly takes into account the risk of collision with obstacles while encouraging the robot's motion towards the goal. We learn the risk-averse robot control actions through Lipschitz approximated Wasserstein distributionally robust deep Q-learning to hedge against the noise uncertainty. The learned control actions result in a safe and risk averse trajectory from the source to the goal, avoiding all the obstacles. Various supporting numerical simulations are presented to demonstrate our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题