学习与目标未知的对手玩轨迹游戏

论文标题

学习与目标未知的对手玩轨迹游戏

Learning to Play Trajectory Games Against Opponents with Unknown Objectives

论文作者

Liu, Xinjie, Peters, Lasse, Alonso-Mora, Javier

论文摘要

许多自主代理人（例如智能车辆）固有地需要互相互动。游戏理论为在这种交互式设置中的机器人运动计划提供了一种自然的数学工具。但是，对于此类问题的可拖动算法通常取决于一个有力的假设，即现场所有参与者的目标是已知的。为了使此类工具仅使用本地信息，适用于以自我为中心的计划，我们提出了一种自适应模型预测的游戏求解器，该求解器共同侵犯其他玩家的目标，并计算相应的广义NASH平衡（GNE）策略。我们方法的适应性由一个可区分的轨迹游戏求解器启用，其梯度信号用于对手目标的最大似然估计（MLE）。我们管道的这种不同性有助于与其他可区分元素（例如神经网络（NNS））直接集成。此外，与现有的求解器相反，我们的方法不仅要处理部分状态观察，而且处理一般不平等约束。在两个模拟的流量方案中，我们发现方法的表现优于现有的游戏理论方法和非游戏模型预测性控制（MPC）方法。我们还在两个硬件实验中演示了方法的实时计划功能和鲁棒性。

Many autonomous agents, such as intelligent vehicles, are inherently required to interact with one another. Game theory provides a natural mathematical tool for robot motion planning in such interactive settings. However, tractable algorithms for such problems usually rely on a strong assumption, namely that the objectives of all players in the scene are known. To make such tools applicable for ego-centric planning with only local information, we propose an adaptive model-predictive game solver, which jointly infers other players' objectives online and computes a corresponding generalized Nash equilibrium (GNE) strategy. The adaptivity of our approach is enabled by a differentiable trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents' objectives. This differentiability of our pipeline facilitates direct integration with other differentiable elements, such as neural networks (NNs). Furthermore, in contrast to existing solvers for cost inference in games, our method handles not only partial state observations but also general inequality constraints. In two simulated traffic scenarios, we find superior performance of our approach over both existing game-theoretic methods and non-game-theoretic model-predictive control (MPC) approaches. We also demonstrate our approach's real-time planning capabilities and robustness in two hardware experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题