TapLab：语义视频分割的快速框架，将其用于压缩域知识

论文标题

TapLab：语义视频分割的快速框架，将其用于压缩域知识

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

论文作者

Feng, Junyi, Li, Songyuan, Li, Xi, Wu, Fei, Tian, Qi, Yang, Ming-Hsuan, Ling, Haibin

论文摘要

由于推理速度的严格要求，实时语义视频细分是一项具有挑战性的任务。最近的方法主要致力于降低模型大小以提高效率。在本文中，我们从不同的角度重新考虑了这个问题：使用压缩视频中包含的知识。我们提出了一个称为TapLab的简单有效框架，以利用压缩域中的资源。具体来说，我们使用运动向量来加速设计一个快速功能翘曲模块。为了减少运动向量引入的噪声，我们设计了一个残留的引导校正模块和使用残差的残留引导的框架选择模块。 TAPLAB大大降低了最先进的快速语义图像分割模型的冗余计算，并以可控精度降解的速度快3至10倍。实验结果表明，TAPLAB在CityScapes数据集上以99.8 fps的价格获得70.6％的MIOU，并使用1024x2048视频的单个GPU卡。高速版本甚至达到160+ fps的速度。代码将很快在https://github.com/sixkplus/taplab上提供。

Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed. Recent approaches mainly devote great efforts to reducing the model size for high efficiency. In this paper, we rethink this problem from a different viewpoint: using knowledge contained in compressed videos. We propose a simple and effective framework, dubbed TapLab, to tap into resources from the compressed domain. Specifically, we design a fast feature warping module using motion vectors for acceleration. To reduce the noise introduced by motion vectors, we design a residual-guided correction module and a residual-guided frame selection module using residuals. TapLab significantly reduces redundant computations of the state-of-the-art fast semantic image segmentation models, running 3 to 10 times faster with controllable accuracy degradation. The experimental results show that TapLab achieves 70.6% mIoU on the Cityscapes dataset at 99.8 FPS with a single GPU card for the 1024x2048 videos. A high-speed version even reaches the speed of 160+ FPS. Codes will be available soon at https://github.com/Sixkplus/TapLab.

下载PDF全文

下载文献需遵守相关版权规定

论文标题