接下来的位置：学习信息轨迹计划的观点建议

论文标题

接下来的位置：学习信息轨迹计划的观点建议

Where to Look Next: Learning Viewpoint Recommendations for Informative Trajectory Planning

论文作者

Lodel, Max, Brito, Bruno, Serra-Gómez, Álvaro, Ferranti, Laura, Babuška, Robert, Alonso-Mora, Javier

论文摘要

搜索任务需要运动计划和导航方法，以根据机器人周围环境的新观察结果不断地进行信息收集。当前的信息收集方法（例如蒙特卡洛树搜索）能够在远距离上进行推理，但它们在计算上很昂贵。快速在线执行的替代方法是离线训练，一项信息收集策略，这间接地理由说明新观察的信息价值。但是，这些政策缺乏安全保证，并且不考虑机器人动态。为了克服这些局限性，我们通过深入的强化学习培训了信息感知的政策，该政策指导了一个退缩的轨迹轨迹优化计划者。特别是，该策略不断向本地规划师提供参考观点，以便由此产生的动态可行和无碰撞的轨迹导致观察结果，从而最大程度地提高信息增益并减少对环境的不确定性。在以前看不见的环境中的仿真测试中，我们的方法始终优于贪婪的次要观看策略，并且与蒙特卡洛树搜索相比，在信息增加和覆盖时间方面，我们的方法在竞争性绩效中取得了竞争性绩效，而执行时间则减少了三个数量级。

Search missions require motion planning and navigation methods for information gathering that continuously replan based on new observations of the robot's surroundings. Current methods for information gathering, such as Monte Carlo Tree Search, are capable of reasoning over long horizons, but they are computationally expensive. An alternative for fast online execution is to train, offline, an information gathering policy, which indirectly reasons about the information value of new observations. However, these policies lack safety guarantees and do not account for the robot dynamics. To overcome these limitations we train an information-aware policy via deep reinforcement learning, that guides a receding-horizon trajectory optimization planner. In particular, the policy continuously recommends a reference viewpoint to the local planner, such that the resulting dynamically feasible and collision-free trajectories lead to observations that maximize the information gain and reduce the uncertainty about the environment. In simulation tests in previously unseen environments, our method consistently outperforms greedy next-best-view policies and achieves competitive performance compared to Monte Carlo Tree Search, in terms of information gains and coverage time, with a reduction in execution time by three orders of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题