基于自我发项机制的无监督pansharpening

论文标题

基于自我发项机制的无监督pansharpening

Unsupervised Pansharpening Based on Self-Attention Mechanism

论文作者

Qu, Ying, Baghbaderani, Razieh Kaviani, Qi, Hairong, Kwan, Chiman

论文摘要

Pansharpening是将低空间分辨率（LR）但丰富的光谱特征的多光谱图像（MSI）融合在一起，具有高空间分辨率（HR）的全频率图像（PAN），但光谱特征较差。传统方法通常将PAN提取的高频细节注入上采样的MSI。假设有HR MSI可用，最近的深度学习努力大多受到监督，这是不现实的，尤其是对于卫星图像。但是，这些方法无法完全利用MSI中的丰富光谱特性。由于在卫星图像中存在混合像素的广泛存在，在卫星图像中，每个像素倾向于覆盖多种组成材料，因此在子像素水平上的pansharping变得必不可少。在本文中，我们在深入学习的框架中提出了一种无监督的pansharpening（UP）方法，以根据自我发挥机制（SAM）（称为UP-SAM）解决上述挑战。本文的贡献是三倍。首先，提出了自我注意力的机制，即根据指示以亚像素准确性的指示MSI的光谱特征的注意力表示估算空间变化的细节提取和注射功能。其次，此类注意力表示是由混合像素和提议的堆叠注意力网络得出的，该网络采用棒状结构供电，以满足混合像素配方的物理约束。第三，细节提取和注入功能根据注意力表示而变化，这在很大程度上提高了重建精度。广泛的实验结果表明，所提出的方法能够重建不同类型的更清晰的MSI，与最先进的方法相比，更多细节和光谱失真较少。

Pansharpening is to fuse a multispectral image (MSI) of low-spatial-resolution (LR) but rich spectral characteristics with a panchromatic image (PAN) of high-spatial-resolution (HR) but poor spectral characteristics. Traditional methods usually inject the extracted high-frequency details from PAN into the up-sampled MSI. Recent deep learning endeavors are mostly supervised assuming the HR MSI is available, which is unrealistic especially for satellite images. Nonetheless, these methods could not fully exploit the rich spectral characteristics in the MSI. Due to the wide existence of mixed pixels in satellite images where each pixel tends to cover more than one constituent material, pansharpening at the subpixel level becomes essential. In this paper, we propose an unsupervised pansharpening (UP) method in a deep-learning framework to address the above challenges based on the self-attention mechanism (SAM), referred to as UP-SAM. The contribution of this paper is three-fold. First, the self-attention mechanism is proposed where the spatial varying detail extraction and injection functions are estimated according to the attention representations indicating spectral characteristics of the MSI with sub-pixel accuracy. Second, such attention representations are derived from mixed pixels with the proposed stacked attention network powered with a stick-breaking structure to meet the physical constraints of mixed pixel formulations. Third, the detail extraction and injection functions are spatial varying based on the attention representations, which largely improves the reconstruction accuracy. Extensive experimental results demonstrate that the proposed approach is able to reconstruct sharper MSI of different types, with more details and less spectral distortion as compared to the state-of-the-art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题