论文标题
通过细心的图形神经网络进行零击视频对象分割
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks
论文作者
论文摘要
这项工作提出了一个新颖的专注图神经网络(AGNN),用于零击视频对象分割(ZVO)。建议的Agnn将这项任务重新制定为视频图上的迭代信息融合过程。具体而言,AGNN构建了一个完全连接的图,以有效地表示为节点,并在任意框架对之间作为边缘之间的关系。潜在的成对关系通过一种可区分的注意机制来描述。通过参数消息传递,Agnn能够有效地捕获和挖掘视频帧之间的更丰富和高阶的关系,从而使人们对视频内容和更准确的前景估计有了更完整的了解。三个视频分割数据集的实验结果表明,AGNN在每种情况下都设置了一个最新的最新。为了进一步证明我们的框架的普遍性,我们将AGNN扩展到另一个任务:图像对象共裂(IOC)。我们在两个著名的IOC数据集上进行实验,并再次观察AGNN模型的优势。广泛的实验证明了Agnn能够学习视频框架或相关图像之间的基本语义/外观关系,并发现共同的对象。
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS). The suggested AGNN recasts this task as a process of iterative information fusion over video graphs. Specifically, AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. The underlying pair-wise relations are described by a differentiable attention mechanism. Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case. To further demonstrate the generalizability of our framework, we extend AGNN to an additional task: image object co-segmentation (IOCS). We perform experiments on two famous IOCS datasets and observe again the superiority of our AGNN model. The extensive experiments verify that AGNN is able to learn the underlying semantic/appearance relationships among video frames or related images, and discover the common objects.