遥感图像变更检测的语义意识密集表示学习

论文标题

遥感图像变更检测的语义意识密集表示学习

Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

论文作者

Chen, Hao, Li, Wenyuan, Chen, Song, Shi, Zhenwei

论文摘要

监督的深度学习模型取决于大量标记的数据。不幸的是，收集和注释包含所需更改的零花态样本是耗时且劳动密集型的。从预训练的模型中转移学习可有效减轻遥感（RS）变化检测（CD）中标签不足。我们探索在预训练期间使用语义信息的使用。与从图像到标签的映射学习的传统监督预训练不同，我们将语义监督纳入了自我监督的学习（SSL）框架中。通常，多个感兴趣的对象（例如，建筑物）以未经切割的RS图像分布在各个位置。我们没有通过全局池操纵图像级表示，而是在每个像素嵌入中引入点级监督以学习空间敏感的特征，从而使下游密集的CD受益。为了实现这一目标，我们使用语义掩码在视图之间的重叠区域上通过类平衡的采样获得了多个点。我们学会了一个嵌入的空间，将背景和前景点分开，并将视图上的空间对齐点齐聚在一起。我们的直觉是导致的语义歧视性表示与无关的变化不变（照明和无关紧要的土地覆盖）可能有助于改变识别。我们收集了RS社区中自由提供的大型图像面罩，用于预训练。在三个CD数据集上进行的大量实验验证了我们方法的有效性。我们的表现明显优于Imagenet预训练，内域监督和几种SSL方法。经验结果表明我们的预训练提高了CD模型的概括和数据效率。值得注意的是，我们使用20％的培训数据比基线（随机初始化）使用100％数据实现了竞争结果。我们的代码可用。

Supervised deep learning models depend on massive labeled data. Unfortunately, it is time-consuming and labor-intensive to collect and annotate bitemporal samples containing desired changes. Transfer learning from pre-trained models is effective to alleviate label insufficiency in remote sensing (RS) change detection (CD). We explore the use of semantic information during pre-training. Different from traditional supervised pre-training that learns the mapping from image to label, we incorporate semantic supervision into the self-supervised learning (SSL) framework. Typically, multiple objects of interest (e.g., buildings) are distributed in various locations in an uncurated RS image. Instead of manipulating image-level representations via global pooling, we introduce point-level supervision on per-pixel embeddings to learn spatially-sensitive features, thus benefiting downstream dense CD. To achieve this, we obtain multiple points via class-balanced sampling on the overlapped area between views using the semantic mask. We learn an embedding space where background and foreground points are pushed apart, and spatially aligned points across views are pulled together. Our intuition is the resulting semantically discriminative representations invariant to irrelevant changes (illumination and unconcerned land covers) may help change recognition. We collect large-scale image-mask pairs freely available in the RS community for pre-training. Extensive experiments on three CD datasets verify the effectiveness of our method. Ours significantly outperforms ImageNet pre-training, in-domain supervision, and several SSL methods. Empirical results indicate our pre-training improves the generalization and data efficiency of the CD model. Notably, we achieve competitive results using 20% training data than baseline (random initialization) using 100% data. Our code is available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题