标签时间：自我监督单程3D对象检测的时间一致性

论文标题

标签时间：自我监督单程3D对象检测的时间一致性

Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

论文作者

Mouawad, Issa, Brasch, Nikolas, Manhardt, Fabian, Tombari, Federico, Odone, Francesca

论文摘要

由于成本优势和RGB摄像机的更广泛可用性，单眼3D对象检测继续引起人们的关注。尽管最近的进步和大规模获取数据的能力，但注释成本和复杂性仍然限制了监督设置中3D对象检测数据集的大小。另一方面，自我监督的方法旨在培训依赖借口任务或各种一致性约束的深层网络。此外，其他3D感知任务（例如深度估计）已将暂时先验作为自学信号的好处。在这项工作中，我们认为对物体姿势的时间一致性在物理运动方面提供了重要的监督信号。具体而言，我们提出了一种自制的损失，除了渲染和能力损失外，它使用这种一致性，以完善嘈杂的姿势预测并得出高质量的伪标签。为了评估所提出的方法的有效性，我们使用我们在实际数据上生成的伪标签进行了综合训练的单眼3D对象检测模型。对标准KITTI3D基准测试的评估表明，与其他单眼自学和监督方法相比，我们的方法达到了竞争性能。

Monocular 3D object detection continues to attract attention due to the cost benefits and wider availability of RGB cameras. Despite the recent advances and the ability to acquire data at scale, annotation cost and complexity still limit the size of 3D object detection datasets in the supervised settings. Self-supervised methods, on the other hand, aim at training deep networks relying on pretext tasks or various consistency constraints. Moreover, other 3D perception tasks (such as depth estimation) have shown the benefits of temporal priors as a self-supervision signal. In this work, we argue that the temporal consistency on the level of object poses, provides an important supervision signal given the strong prior on physical motion. Specifically, we propose a self-supervised loss which uses this consistency, in addition to render-and-compare losses, to refine noisy pose predictions and derive high-quality pseudo labels. To assess the effectiveness of the proposed method, we finetune a synthetically trained monocular 3D object detection model using the pseudo-labels that we generated on real data. Evaluation on the standard KITTI3D benchmark demonstrates that our method reaches competitive performance compared to other monocular self-supervised and supervised methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题