GSTO：用于像素标签中多尺度特征学习的封闭式比例转移操作

论文标题

GSTO：用于像素标签中多尺度特征学习的封闭式比例转移操作

GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling

论文作者

Wang, Zhuoying, Wang, Yongtao, Tang, Zhi, Li, Yangyan, Chen, Ying, Ling, Haibin, Lin, Weisi

论文摘要

现有的基于CNN的像素标记的方法在很大程度上取决于多尺度功能，以满足语义理解和细节保存的要求。最先进的像素标记神经网络广泛利用传统的规模转移操作，即向上采样和下采样，以学习多尺度功能。在这项工作中，我们发现这些操作会导致规模融合的特征和次优性能，因为它们是空间不变的，并且直接传输了所有特征信息跨尺度，而无需空间选择。为了解决这个问题，我们提出了封闭式的比例转移操作（GSTO），以将空间过滤的特征正确地到另一个尺度上。具体而言，GSTO可以在有或没有额外的监督下工作。无监督的GSTO是从功能本身中学到的，而受监督的GSTO受监督概率矩阵的指导。两种形式的GSTO都是轻量级和插件，可以灵活地集成到网络或模块中，以学习更好的多规模功能。特别是，通过将GSTO插入HRNET，我们获得了一个更强大的骨干（即GSTO-HRNET），用于像素标签，并在人类姿势估计的可可基准和其他基准的基准上获得了新的最先进的结果，用于语义序列，以换取CityScapes，Lip和Pascal Condection Negnegal Compation Comentical Concentical Concentical Concentical Concentical Concentical Concemantic sevistation。此外，实验结果表明，GSTO还可以显着提高多尺度特征聚合模块（如PPM和ASPP）的性能。代码将在https://github.com/vdigpku/gsto上提供。

Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both forms of GSTO are lightweight and plug-and-play, which can be flexibly integrated into networks or modules for learning better multi-scale features. In particular, by plugging GSTO into HRNet, we get a more powerful backbone (namely GSTO-HRNet) for pixel labeling, and it achieves new state-of-the-art results on the COCO benchmark for human pose estimation and other benchmarks for semantic segmentation including Cityscapes, LIP and Pascal Context, with negligible extra computational cost. Moreover, experiment results demonstrate that GSTO can also significantly boost the performance of multi-scale feature aggregation modules like PPM and ASPP. Code will be made available at https://github.com/VDIGPKU/GSTO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题