快速视频语义细分的时间分布式网络

论文标题

快速视频语义细分的时间分布式网络

Temporally Distributed Networks for Fast Video Semantic Segmentation

论文作者

Hu, Ping, Heilbron, Fabian Caba, Wang, Oliver, Lin, Zhe, Sclaroff, Stan, Perazzi, Federico

论文摘要

我们提出了TDNET，这是一个时间分布式网络，旨在快速，准确的视频语义细分。我们观察到，可以通过从几个较浅的子网络中提取的特征来近似从深CNN的某些高级层提取的特征。利用视频中固有的时间连续性，我们通过顺序帧分发了这些子网络。因此，在每个时间步骤中，我们只需要执行轻量级计算即可从单个子网络中提取子功能组。然后，通过应用新的注意传播模块来重新组装用于分割的完整特征，该模块补偿了帧之间的几何形状变形。还引入了分组的知识蒸馏损失，以进一步提高完整和亚功能水平的表示能力。对CityScapes，Camvid和Nyud-V2进行的实验表明，我们的方法以明显更快的速度和较低的潜伏度实现了最先进的准确性。

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefore, at each time step, we only need to perform a lightweight computation to extract a sub-features group from a single sub-network. The full features used for segmentation are then recomposed by application of a novel attention propagation module that compensates for geometry deformation between frames. A grouped knowledge distillation loss is also introduced to further improve the representation power at both full and sub-feature levels. Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题