Zoomnet：3D对象检测的部分感知自适应缩放神经网络

论文标题

Zoomnet：3D对象检测的部分感知自适应缩放神经网络

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

论文作者

Xu, Zhenbo, Zhang, Wei, Ye, Xiaoqing, Tan, Xiao, Yang, Wei, Wen, Shilei, Ding, Errui, Meng, Ajin, Huang, Liusheng

论文摘要

3D对象检测是自主驾驶和机器人技术的重要任务。尽管取得了巨大进展，但在估计遥远和遮挡物体的3D姿势中仍然存在挑战。在本文中，我们为基于立体图像的3D检测提供了一个名为Zoomnet的新型框架。 Zoomnet的管道始于普通的2D对象检测模型，该模型用于获取一对左右边界框。为了进一步利用RGB图像中的丰富纹理提示，以进行更准确的差异估计，我们引入了一个概念上的直率模块 - 自适应缩放，该模块同时将2D实例界限将框大于统一的分辨率并调整摄像机固有参数。通过这种方式，我们能够从调整大小的框图像中估算更高质量的差异图，然后为附近和遥远的对象构造密集的点云。此外，我们介绍将部分位置作为互补特征，以提高抵抗阻塞的阻力，并提出3D拟合评分，以更好地估计3D检测质量。对流行的Kitti 3D检测数据集进行的广泛实验表明，Zoomnet超过了所有先前的最新方法，该方法的大幅度较大（pseudo-lidar上的APBV（IOU = 0.7）提高了9.4％）。消融研究还表明，我们的自适应缩放策略在AP3D上提高了10％以上（IOU = 0.7）。此外，由于官方的Kitti Benchmark缺乏诸如像素方面的位置之类的细粒注释，我们还通过增强KITTI的详细实例注释来介绍我们的KFG数据集，包括Pixel-wise部分位置，像素的差异等。

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题