基于细粒草图的图像检索的跨模式分层建模

论文标题

基于细粒草图的图像检索的跨模式分层建模

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

论文作者

Sain, Aneeshan, Bhunia, Ayan Kumar, Yang, Yongxin, Xiang, Tao, Song, Yi-Zhe

论文摘要

草图作为图像搜索查询是捕获细颗粒视觉细节的文本的理想选择。基于细粒度的基于素描图的图像检索（FG-SBIR）的先前成功证明了要解决素描的独特特征而不是照片，例如，时间与静态，strokes vs.像素，以及抽象与Pixel-Perfect。在本文中，我们研究了迄今为止被忽略的进一步的草图特征，也就是说，它们在细节的水平上是分层的 - 一个人通常会素描到各种细节的范围，以描绘对象。这种层次结构通常在视觉上是不同的。在本文中，我们设计了一个新颖的网络，该网络能够培养特定于草图的层次结构并利用它们以与相应层次级别的照片相匹配。特别是，使用跨模式的共同注意力丰富了草图和照片的特征，并在每个级别的分层节点融合结合在一起，形成更好的嵌入空间以进行检索。对共同基准测试的实验表明，我们的方法以明显的边距胜过最先进的方法。

Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题