论文标题
3D点云表示学习中的蒙版自动编码器
Masked Autoencoders in 3D Point Cloud Representation Learning
论文作者
论文摘要
基于变压器的自我监督表示方法学习方法从未标记的数据集中学习通用功能,以提供有用的网络初始化参数,用于下游任务。最近,基于掩盖3D点云数据的局部表面斑块的自我监督学习的探索量不足。在本文中,我们建议在3D点云表示学习中提出蒙版的自动编码器(缩写为MAE3D),这是一种新颖的自动编码范式,用于自学学习。我们首先将输入点云拆分为补丁,然后掩盖其中的一部分,然后使用我们的补丁嵌入模块提取未掩盖的补丁的功能。其次,我们采用补丁的MAE3D变形金刚学习点云补丁的本地特征,以及补丁之间的高级上下文关系,并完成蒙版补丁的潜在表示。结果,我们将点云重建模块与多任务丢失一起完成,从而完成不完整的点云。我们在Shapenet55上进行了自我监督的预训练,并具有点云完成预文本任务,并在ModelNet40和ScanObjectnn(PB \ _T50 \ _RS,最难的变体)上微调了预训练的模型。全面的实验表明,我们的MAE3D从Point Cloud补丁提取的本地功能对下游分类任务有益,表现优于最先进的方法($ 93.4 \%\%\%\%$和$ 86.2 \%$ $分类精度)。
Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, self-supervised learning based upon masking local surface patches for 3D point cloud data has been under-explored. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches and complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB\_T50\_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods ($93.4\%$ and $86.2\%$ classification accuracy, respectively).