论文标题
bevfusion:多任务多传感器融合与统一鸟的视图表示形式
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
论文作者
论文摘要
多传感器融合对于准确可靠的自主驾驶系统至关重要。最近的方法基于点级融合:通过相机功能增强激光点云。但是,摄像头投影丢弃了相机功能的语义密度,阻碍了此类方法的有效性,尤其是对于面向语义的任务(例如3D场景细分)。在本文中,我们用BevFusion(一个有效且通用的多任务多传感器融合框架)打破了这个根深蒂固的惯例。它统一了共享鸟类视图(BEV)表示空间中的多模式特征,该空间很好地保留了几何信息和语义信息。为了实现这一目标,我们通过优化的BEV池进行诊断和提高视图转换中的钥匙效率瓶颈,从而使延迟降低了40倍以上。 BevFusion从根本上是任务不合时宜的,并且无缝支持不同的3D感知任务,几乎没有建筑变化。它在Nuscenes上建立了新的最新技术,在3D对象检测上获得了1.3%的MAP和NDS,在BEV MAP分割上,MIOU高13.6%,计算成本较低1.9倍。可以在https://github.com/mit-han-lab/bevfusion上获得复制我们的结果的代码。
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost. Code to reproduce our results is available at https://github.com/mit-han-lab/bevfusion.