论文标题

变压器比例门用于语义分段

Transformer Scale Gate for Semantic Segmentation

论文作者

Shi, Hengcan, Hayat, Munawar, Cai, Jianfei

论文摘要

有效地编码多尺度上下文信息对于准确的语义分割至关重要。现有的基于变压器的分割模型结合了范围内的特征,而无需选择任何选择,在该尺度上的特征可能会降低分割结果。利用视觉变压器的固有属性,我们提出了一个简单而有效的模块,变压器尺度门(TSG),以最佳地结合多尺度功能。TSG利用自我和跨视力变压器中的交叉注意力提示进行量表选择。 TSG是一个高度灵活的插件模块,可以轻松地将其与任何基于编码器的分层视觉变压器体系结构结合在一起。在Pascal环境和ADE20K数据集上进行的广泛实验表明,我们的特征选择策略可实现一致的增长。

Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation. Existing transformer-based segmentation models combine features across scales without any selection, where features on sub-optimal scales may degrade segmentation outcomes. Leveraging from the inherent properties of Vision Transformers, we propose a simple yet effective module, Transformer Scale Gate (TSG), to optimally combine multi-scale features.TSG exploits cues in self and cross attentions in Vision Transformers for the scale selection. TSG is a highly flexible plug-and-play module, and can easily be incorporated with any encoder-decoder-based hierarchical vision Transformer architecture. Extensive experiments on the Pascal Context and ADE20K datasets demonstrate that our feature selection strategy achieves consistent gains.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源