语义流以快速准确解析

论文标题

语义流以快速准确解析

Semantic Flow for Fast and Accurate Scene Parsing

论文作者

Li, Xiangtai, You, Ansheng, Zhu, Zhen, Zhao, Houlong, Yang, Maoke, Yang, Kuiyuan, Tong, Yunhai

论文摘要

在本文中，我们专注于设计有效的方法，以快速准确地解析。提高性能的一种普遍做法是获得具有强大语义表示的高分辨率特征图。两种策略被广泛使用 - 非常卷积和特征金字塔融合，是计算密集型或无效的。受到相邻视频帧之间运动对齐的光流的启发，我们提出了一个流程比对模块（FAM），以学习相邻级别的特征图之间的语义流，以及广播高水平特征对高分辨率的特征有效，有效地有效。此外，将我们的模块集成到通用特征金字塔结构，即使在轻量重量骨干网络（例如RESNET-18）上，也比其他实时方法表现出卓越的性能。在几个具有挑战性的数据集上进行了广泛的实验，包括CityScapes，Pascal环境，ADE20K和Camvid。尤其是，我们的网络是第一个在26 fps的帧速率上实现80.4 \％miou的网络。该代码可在\ url {https://github.com/lxtgh/sfsegnets}中获得。

In this paper, we focus on designing effective method for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely used -- atrous convolutions and feature pyramid fusion, are either computation intensive or ineffective. Inspired by the Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels, and broadcast high-level features to high resolution features effectively and efficiently. Furthermore, integrating our module to a common feature pyramid structure exhibits superior performance over other real-time methods even on light-weight backbone networks, such as ResNet-18. Extensive experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid. Especially, our network is the first to achieve 80.4\% mIoU on Cityscapes with a frame rate of 26 FPS. The code is available at \url{https://github.com/lxtGH/SFSegNets}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题