通过隐式高度估计，遥感图像的几何学意识分割

论文标题

通过隐式高度估计，遥感图像的几何学意识分割

Geometry-Aware Segmentation of Remote Sensing Images via Implicit Height Estimation

论文作者

Li, Xiang, Wang, Lingjing, Fang, Yi

论文摘要

最近的研究表明，使用其他高程数据（例如DSM）来增强航空图像的语义分割的性能。但是，以前的方法主要采用3D高程信息作为其他输入。虽然在许多现实世界中，但没有一个相应的DSM信息，而所获得的DSM图像的空间分辨率通常与空中图像不匹配。为了减轻此数据约束并利用3D高程信息，我们引入了一种几何学感知的分割模型，该模型通过关节高度估计来实现空中图像的准确语义标记。我们设计一个单独的解码器分支来预测高度图并使用DSM图像作为侧面监督来训练这个新设计的解码器分支，而不是使用单个编码器网络进行语义标签，而是设计一个单独的解码器分支。这样，我们的模型不需要DSM作为模型输入，并且在培训期间的有用3D几何信息中仍然受益。此外，我们开发了一个新的几何感知卷积模块，该模块融合了高度解码器分支的3D几何特征和语义分割分支的2D上下文特征。融合的特征嵌入可以产生具有增强性能的几何学感知分割图。我们的模型以DSM图像为侧监管训练，而在推论阶段，它不需要DSM数据，并直接以端到端的方式预测语义标签。 ISPRS Vaihingen和Potsdam数据集进行的实验证明了所提出的方法对空中图像的语义分割的有效性。所提出的模型在两个数据集上实现了出色的性能，而无需使用任何手工制作的功能或后处理。

Recent studies have shown the benefits of using additional elevation data (e.g., DSM) for enhancing the performance of the semantic segmentation of aerial images. However, previous methods mostly adopt 3D elevation information as additional inputs. While in many real-world applications, one does not have the corresponding DSM information at hand and the spatial resolution of acquired DSM images usually do not match the aerial images. To alleviate this data constraint and also take advantage of 3D elevation information, in this paper, we introduce a geometry-aware segmentation model that achieves accurate semantic labeling of aerial images via joint height estimation. Instead of using a single-stream encoder-decoder network for semantic labeling, we design a separate decoder branch to predict the height map and use the DSM images as side supervision to train this newly designed decoder branch. In this way, our model does not require DSM as model input and still benefits from the helpful 3D geometric information during training. Moreover, we develop a new geometry-aware convolution module that fuses the 3D geometric features from the height decoder branch and the 2D contextual features from the semantic segmentation branch. The fused feature embeddings can produce geometry-aware segmentation maps with enhanced performance. Our model is trained with DSM images as side supervision, while in the inference stage, it does not require DSM data and directly predicts the semantic labels in an end-to-end fashion. Experiments on ISPRS Vaihingen and Potsdam datasets demonstrate the effectiveness of the proposed method for the semantic segmentation of aerial images. The proposed model achieves remarkable performance on both datasets without using any hand-crafted features or post-processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题