更少的是：重新思考人类行为的概率模型

论文标题

更少的是：重新思考人类行为的概率模型

LESS is More: Rethinking Probabilistic Models of Human Behavior

论文作者

Bobu, Andreea, Scobee, Dexter R. R., Fisac, Jaime F., Sastry, S. Shankar, Dragan, Anca D.

论文摘要

机器人需要人类行为的模型来推断人类的目标和偏好，并预测人们将做什么。一个常见的模型是鲍尔茨曼嘈杂的决策模型，该模型假设人们大约优化奖励功能，并根据其指定奖励成比例地选择轨迹。尽管该模型在各种机器人领域都取得了成功，但其根源在于计量经济学，以及在不同的离散选项之间建模决策，每种选择都有自己的效用或奖励。相比之下，人类轨迹位于连续的空间中，具有影响奖励功能的连续价值特征。我们建议现在是时候重新考虑Boltzmann模型，并将其从头开始设计以在此类轨迹空间上进行操作。我们介绍了一个模型，该模型明确说明了轨迹之间的距离，而不仅仅是它们的奖励。现在，类似的轨迹并没有独立影响决策，而不是每个轨迹都会共同影响决策。首先，我们表明我们的模型可以更好地解释用户研究中的人类行为。然后，我们分析了这种对机器人推理的影响，首先是在我们拥有真实性并找到更准确推断的玩具环境中，最后是从用户演示中学习的7DOF机器人臂学习。

Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or reward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajectory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题