MAP-GEN：具有多模式注意点生成器的自动3D-box注释流

论文标题

MAP-GEN：具有多模式注意点生成器的自动3D-box注释流

MAP-Gen: An Automated 3D-Box Annotation Flow with Multimodal Attention Point Generator

论文作者

Liu, Chang, Qian, Xiaoyan, Qi, Xiaojuan, Lam, Edmund Y., Tan, Siew-Chong, Wong, Ngai

论文摘要

手动注释3D点云是费力且昂贵的，限制了在现实世界对象检测中进行深度学习的训练数据准备。虽然先前的一些研究试图自动从弱标签（例如2D框）中生成3D边界框，但与人类注释者相比，质量是亚最佳的。这项工作提出了一种新型的自动标签，称为多峰注意点发生器（MAP-GEN），该标签会从弱2D盒中生成高质量的3D标签。它利用密集的图像信息来解决3D点云的稀疏性问题，从而提高了标签质量。对于每个2D像素，MAP-GEN通过基于其2D语义或几何关系引用上下文点来预测其相应的3D坐标。生成的3D点致密原始的稀疏点云，然后是编码器以回归3D边界框。使用MAP-GEN，由2D框弱监督的对象检测网络可以实现94〜99％的性能，以通过3D注释完全监督。希望这种新提出的地图 - 基因自动标记流可以为利用多模式信息提供新的灯光，以丰富稀疏点云。

Manually annotating 3D point clouds is laborious and costly, limiting the training data preparation for deep learning in real-world object detection. While a few previous studies tried to automatically generate 3D bounding boxes from weak labels such as 2D boxes, the quality is sub-optimal compared to human annotators. This work proposes a novel autolabeler, called multimodal attention point generator (MAP-Gen), that generates high-quality 3D labels from weak 2D boxes. It leverages dense image information to tackle the sparsity issue of 3D point clouds, thus improving label quality. For each 2D pixel, MAP-Gen predicts its corresponding 3D coordinates by referencing context points based on their 2D semantic or geometric relationships. The generated 3D points densify the original sparse point clouds, followed by an encoder to regress 3D bounding boxes. Using MAP-Gen, object detection networks that are weakly supervised by 2D boxes can achieve 94~99% performance of those fully supervised by 3D annotations. It is hopeful this newly proposed MAP-Gen autolabeling flow can shed new light on utilizing multimodal information for enriching sparse point clouds.

下载PDF全文

下载文献需遵守相关版权规定

论文标题