对室内环境中真正的自动移动机器人导航的深度强化学习

论文标题

对室内环境中真正的自动移动机器人导航的深度强化学习

Deep Reinforcement learning for real autonomous mobile robot navigation in indoor environments

论文作者

Surmann, Hartmut, Jestel, Christian, Marchel, Robin, Musberg, Franziska, Elhadj, Houssem, Ardani, Mahbube

论文摘要

深厚的增强学习已成功地应用于各种计算机游戏[8]。但是，它仍然很少用于现实世界应用程序，尤其是用于导航和连续控制真实移动机器人[13]。以前的方法缺乏安全性和鲁棒性和/或需要结构化的环境。在本文中，我们介绍了在未知的环境中为没有地图或计划者的真实机器人在未知环境中自动自动学习机器人导航的概念证明。机器人的输入仅是来自2D激光扫描仪和RGB-D摄像头的融合数据以及目标的方向。环境地图未知。异步优势参与者批评网络（GA3C）的输出动作是机器人的线性和角速度。导航器/控制器网络是在高速，平行和自我实施的模拟环境中预测的，以加快学习过程，然后部署到真正的机器人。为了避免过度拟合，我们训练相对较小的网络，并在输入激光器数据中添加随机的高斯噪声。传感器数据融合与RGB-D摄像机允许机器人在具有真实3D障碍物避免的真实环境中导航，而无需适应机器人的感官功能。为了进一步提高鲁棒性，我们在不同困难的环境中进行培训，并同时运行32个培训实例。视频：补充文件 / YouTube，代码：github

Deep Reinforcement Learning has been successfully applied in various computer games [8]. However, it is still rarely used in real-world applications, especially for the navigation and continuous control of real mobile robots [13]. Previous approaches lack safety and robustness and/or need a structured environment. In this paper we present our proof of concept for autonomous self-learning robot navigation in an unknown environment for a real robot without a map or planner. The input for the robot is only the fused data from a 2D laser scanner and a RGB-D camera as well as the orientation to the goal. The map of the environment is unknown. The output actions of an Asynchronous Advantage Actor-Critic network (GA3C) are the linear and angular velocities for the robot. The navigator/controller network is pretrained in a high-speed, parallel, and self-implemented simulation environment to speed up the learning process and then deployed to the real robot. To avoid overfitting, we train relatively small networks, and we add random Gaussian noise to the input laser data. The sensor data fusion with the RGB-D camera allows the robot to navigate in real environments with real 3D obstacle avoidance and without the need to fit the environment to the sensory capabilities of the robot. To further increase the robustness, we train on environments of varying difficulties and run 32 training instances simultaneously. Video: supplementary File / YouTube, Code: GitHub

下载PDF全文

下载文献需遵守相关版权规定

论文标题