论文标题
OBMO:一个边界盒单镜3D对象检测的多个对象
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection
论文作者
论文摘要
与典型的多传感器系统相比,单眼3D对象检测由于其简单配置而引起了很多关注。但是,基于激光雷达的方法和基于单眼的方法之间仍然存在显着差距。在本文中,我们发现单眼图像的不足性质会导致深度歧义。具体而言,具有不同深度的对象可以在2D图像中使用相同的边界框和相似的视觉特征出现。不幸的是,网络无法准确区分不同的深度与这种非歧视性视觉特征,从而导致深度训练。为了促进深度学习,我们提出了一个简单而有效的插件模块,\ USWISELLINE {o} NE \ USWISELLINE {B} OUNDING BOX \ UNESLINE {M} Ultiple \ Untiple \ Unterline {O} Bigns {O} Bigns(OBMO)。具体来说,我们通过将3D边界盒沿观看flustum移动,添加了一组合适的伪标签。为了限制伪3D标签是合理的,我们仔细设计了两个标签评分策略来代表其质量。与原始的硬度深度标签相反,这种具有质量分数的软伪标签使网络可以学习合理的深度范围,提高训练稳定性并改善最终性能。关于KITTI和Waymo基准测试的广泛实验表明,我们的方法显着改善了最先进的单眼3D探测器(Kitti验证集中的中等设置下的改进是$ \ Mathbf {1.82 \ sim 10.91 \ sim 10.91 \%} $ \%} $ \ textbf {bevextbf {bevextbf {bevextbf {1.18 9.36 \%} $ \ textbf {3D}中的地图)。代码已在\ url {https://github.com/mrsempress/obmo}发布。
Compared to typical multi-sensor systems, monocular 3D object detection has attracted much attention due to its simple configuration. However, there is still a significant gap between LiDAR-based and monocular-based methods. In this paper, we find that the ill-posed nature of monocular imagery can lead to depth ambiguity. Specifically, objects with different depths can appear with the same bounding boxes and similar visual features in the 2D image. Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training. To facilitate depth learning, we propose a simple yet effective plug-and-play module, \underline{O}ne \underline{B}ounding Box \underline{M}ultiple \underline{O}bjects (OBMO). Concretely, we add a set of suitable pseudo labels by shifting the 3D bounding box along the viewing frustum. To constrain the pseudo-3D labels to be reasonable, we carefully design two label scoring strategies to represent their quality. In contrast to the original hard depth labels, such soft pseudo labels with quality scores allow the network to learn a reasonable depth range, boosting training stability and thus improving final performance. Extensive experiments on KITTI and Waymo benchmarks show that our method significantly improves state-of-the-art monocular 3D detectors by a significant margin (The improvements under the moderate setting on KITTI validation set are $\mathbf{1.82\sim 10.91\%}$ \textbf{mAP in BEV} and $\mathbf{1.18\sim 9.36\%}$ \textbf{mAP in 3D}). Codes have been released at \url{https://github.com/mrsempress/OBMO}.