论文标题

使用视觉和语义嵌入的共同注意力进行弱监督的很少的对象分割

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

论文作者

Siam, Mennatullah, Doraiswamy, Naren, Oreshkin, Boris N., Yao, Hengshuai, Jagersand, Martin

论文摘要

最近在开发少数射门对象分割方法中取得了重大进展。使用像素级,涂鸦和边界框监督,学习表明学习在几次射击分段设置中取得了成功。本文采用另一种方法,即仅需要图像级标签以进行几个弹片对象分割。我们提出了一种新型的多模式相互作用模块,以用于几个射击对象分割,该模块使用视觉和单词嵌入使用共同注意机制。我们使用图像级标签的模型比以前提出的图像级少数点对象分割相比,提高了4.8%。它还优于使用Pascal-5i上使用弱边界框监督的最先进方法。我们的结果表明,使用单词嵌入,很少有射击分割受益,并且我们能够使用带有弱图像级标签的堆叠关节视觉语义处理来执行很少的分割。我们进一步提出了一个新颖的设置,用于几次学习(TOSFL)的新型设置,用于视频。 TOSFL可以用于各种公共视频数据,例如YouTube-VOS,如实例级别和类别级别TOSFL实验中所示。

Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embedding. Our model using image-level labels achieves 4.8% improvement over previously proposed image-level few-shot object segmentation. It also outperforms state-of-the-art methods that use weak bounding box supervision on PASCAL-5i. Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels. We further propose a novel setup, Temporal Object Segmentation for Few-shot Learning (TOSFL) for videos. TOSFL can be used on a variety of public video data such as Youtube-VOS, as demonstrated in both instance-level and category-level TOSFL experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源