论文标题
Gorela:亲戚以获取视点不变的运动预测
GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting
论文作者
论文摘要
运动预测的任务对于能够计划安全的动作至关重要。为了实现这一目标,现代的理由是关于地图,代理商过去的轨迹及其相互作用的原因,以产生准确的预测。主要的方法是在每个目标代理的参考框架中对地图和其他代理进行编码。但是,对于多代理预测,这种方法在计算上很昂贵,因为每个代理都需要运行推理。为了应对缩放挑战,到目前为止,解决方案是在共享坐标框架(例如SDV帧)中编码所有试剂和地图。但是,这是效率低下的样本且容易受到域移位的影响(例如,当SDV访问不常见的状态时)。相比之下,在本文中,我们提出了对所有代理和地图的有效共享编码,而无需牺牲准确性或概括。为了实现这一目标,我们利用成对的相对位置编码来表示代理之间的几何关系和在异质空间图中的地图元素之间的几何关系。这种参数化使我们可以不变到场景观点,并通过重新使用脱机计算的地图嵌入来保存在线计算。我们的解码器也是观点不可知论,可以预测车道图上的代理目标,以实现多样化和上下文感知的多模式预测。我们证明了我们的方法对城市Argoverse 2基准和新型高速公路数据集的有效性。
The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.