雷达引导的动态视觉关注，用于资源有效的RGB对象检测

论文标题

雷达引导的动态视觉关注，用于资源有效的RGB对象检测

Radar Guided Dynamic Visual Attention for Resource-Efficient RGB Object Detection

论文作者

Kumawat, Hemant, Mukhopadhyay, Saibal

论文摘要

自治系统的感知引擎必须对其做出决策的环境提供准确的理解。基于深度学习的对象检测网络由于对象的特征图的减少而降低了网络的较高层，因此在小小的和遥远的物体的性能和鲁棒性中经历了降解。在这项工作中，我们提出了一种新型的RGB图像引导的空间关注，以提高在动态环境中运行的自动驾驶汽车的感知质量。特别是，我们的方法改善了对远程和远程对象的感知，在RGB模式下，对象检测器通常不会检测到小小的和远程对象。该方法由两个RGB对象检测器组成，即主要检测器和一个轻量级二级检测器。主检测器采用完整的RGB图像并生成主要检测。接下来，雷达提案框架通过将雷达点云投射到2D RGB图像上，为对象提案创建了感兴趣的区域（ROI）。这些ROI被裁剪并喂入二级检测器，以产生二次检测，然后通过非最大抑制作用将其与主要检测融合。此方法通过增加对象的接收场来保留对象的空间特征，从而有助于恢复小物体。我们将融合方法评估在具有挑战性的Nuscenes数据集上，并表明我们使用SSD-Lite作为主要检测器和二级检测器的融合方法将基线原发性Yolov3检测器的召回率提高了14％，同时需要减少三倍的计算资源。

An autonomous system's perception engine must provide an accurate understanding of the environment for it to make decisions. Deep learning based object detection networks experience degradation in the performance and robustness for small and far away objects due to a reduction in object's feature map as we move to higher layers of the network. In this work, we propose a novel radar-guided spatial attention for RGB images to improve the perception quality of autonomous vehicles operating in a dynamic environment. In particular, our method improves the perception of small and long range objects, which are often not detected by the object detectors in RGB mode. The proposed method consists of two RGB object detectors, namely the Primary detector and a lightweight Secondary detector. The primary detector takes a full RGB image and generates primary detections. Next, the radar proposal framework creates regions of interest (ROIs) for object proposals by projecting the radar point cloud onto the 2D RGB image. These ROIs are cropped and fed to the secondary detector to generate secondary detections which are then fused with the primary detections via non-maximum suppression. This method helps in recovering the small objects by preserving the object's spatial features through an increase in their receptive field. We evaluate our fusion method on the challenging nuScenes dataset and show that our fusion method with SSD-lite as primary and secondary detector improves the baseline primary yolov3 detector's recall by 14% while requiring three times fewer computational resources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题