与非上下文匪徒反馈的人类在循环机器人计划

论文标题

与非上下文匪徒反馈的人类在循环机器人计划

Human-in-the-Loop Robot Planning with Non-Contextual Bandit Feedback

论文作者

Zhou, Yijie, Zhang, Yan, Luo, Xusheng, Zavlanos, Michael M.

论文摘要

在本文中，我们考虑了人类填充的环境中的机器人导航问题。目的是确定无碰撞和动态可行的轨迹，这也最大化人类满意度。这是因为他们可能会将机器人带到需要帮助的人类附近的机器人，或者因为他们可能会干扰人类的视力或工作时将机器人远离人类。实际上，人类满意度是主观的，很难在数学上描述。结果，我们在本文中考虑的计划问题可能缺乏重要的上下文信息。为了应对这一挑战，我们提出了一种半监督的贝叶斯优化方法（BO）方法，以使用非上下文的强盗人类反馈以投诉或满意度等级的形式设计全球最佳的机器人轨迹，以表达轨迹的满意度多么令人满意，而没有揭示原因。由于轨迹规划通常是定义轨迹的空间空间中的高维优化问题，因此BO可能需要对人类反馈的许多查询才能返回良好的解决方案。为此，我们使用自动编码器将高维问题的空间降低到一个低维的潜在空间，我们使用人类反馈进行更新。此外，我们通过偏向于使用现成运动计划者获得的动态可行和无碰撞的轨迹来提高BO的勘探效率。我们在人类的情况下，我们提出的轨迹规划方法的效率是多元化和未知的需求。

In this paper, we consider a robot navigation problem in environments populated by humans. The goal is to determine collision-free and dynamically feasible trajectories that also maximize human satisfaction. This is because they may drive the robot close to humans that need help with their work or because they may keep the robot away from humans when it can interfere with human sight or work. In practice, human satisfaction is subjective and hard to describe mathematically. As a result, the planning problem we consider in this paper may lack important contextual information. To address this challenge, we propose a semi-supervised Bayesian Optimization (BO) method to design globally optimal robot trajectories using non-contextual bandit human feedback in the form of complaints or satisfaction ratings that express how satisfactory a trajectory is, without revealing the reason. Since trajectory planning is typically a high-dimensional optimization problem in the space of waypoints that define a trajectory, BO may require prohibitively many queries for human feedback to return a good solution. To this end, we use an autoencoder to reduce the high-dimensional problem space into a low dimensional latent space, which we update using human feedback. Moreover, we improve the exploration efficiency of BO by biasing the search for new trajectories towards dynamically feasible and collision-free trajectories obtained using off-the-shelf motion planners. We demonstrate the efficiency of our proposed trajectory planning method in a scenario with humans that have diversified and unknown demands.

下载PDF全文

下载文献需遵守相关版权规定

论文标题