为视频对象细分学习快速，强大的目标模型

论文标题

为视频对象细分学习快速，强大的目标模型

Learning Fast and Robust Target Models for Video Object Segmentation

论文作者

Robinson, Andreas, Lawin, Felix Järemo, Danelljan, Martin, Khan, Fahad Shahbaz, Felsberg, Michael

论文摘要

视频对象细分（VOS）是一个高度挑战的问题，因为定义目标对象的初始掩码仅在测试时间内给出。主要困难是有效处理外观变化和相似的背景对象，同时保持准确的分割。以前的大多数方法都在第一帧上微调细分网络，从而导致不切实际的框架率和过度拟合的风险。最新的方法会整合生成目标外观模型，但要么实现有限的鲁棒性或需要大量的训练数据。我们提出了一个新型的VOS架构，该体系结构由两个网络组件组成。目标外观模型由一个轻重量模块组成，该模块在推理阶段使用快速优化技术来预测粗糙但健壮的目标分割。分割模型是专门训练的离线训练，旨在将粗得分处理为高质量的分割口罩。我们的方法快速，易于训练，并且在培训数据有限的情况下仍然非常有效。我们对挑战的YouTube-VOS和Davis数据集进行了广泛的实验。与最先进的情况相比，我们的网络在更高的帧速率下运行时取得了有利的性能。代码和训练有素的模型可在https://github.com/andr345/frtm-vos上找到。

Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. The main difficulty is to effectively handle appearance changes and similar background objects, while maintaining accurate segmentation. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. More recent methods integrate generative target appearance models, but either achieve limited robustness or require large amounts of training data. We propose a novel VOS architecture consisting of two network components. The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation. The segmentation model is exclusively trained offline, designed to process the coarse scores into high quality segmentation masks. Our method is fast, easily trainable and remains highly effective in cases of limited training data. We perform extensive experiments on the challenging YouTube-VOS and DAVIS datasets. Our network achieves favorable performance, while operating at higher frame-rates compared to state-of-the-art. Code and trained models are available at https://github.com/andr345/frtm-vos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题