子图：朝向子词信息图像字幕

论文标题

子图：朝向子词信息图像字幕

SubICap: Towards Subword-informed Image Captioning

论文作者

Sharif, Naeha, Bennamoun, Mohammed, Liu, Wei, Shah, Syed Afaq Ali

论文摘要

现有的图像字幕（IC）系统将单词模拟为标题中的原子单元，并且无法用单词中的结构信息来利用。这使得稀有单词的表示非常困难和不可能的唱片词。此外，为了避免计算复杂性，现有的IC模型通过频繁单词的适度词汇进行操作，从而丢失了稀有单词的身份。在这项工作中，我们解决了IC系统对CORPOR中稀有词的这种共同局限性。我们将单词分解为较小的组成单元的“子字”，并表示字幕作为子字而不是单词的顺序。这有助于使用较低的子字词汇来表示语料库中的所有单词，从而提供更好的参数学习。使用子词语言建模，我们的字幕系统改善了各种度量分数，训练词汇大小比基线和各种最先进的单词级模型小约90％。我们的定量和定性结果和分析表示我们提出的方法的功效。

Existing Image Captioning (IC) systems model words as atomic units in captions and are unable to exploit the structural information in the words. This makes representation of rare words very difficult and out-of-vocabulary words impossible. Moreover, to avoid computational complexity, existing IC models operate over a modest sized vocabulary of frequent words, such that the identity of rare words is lost. In this work we address this common limitation of IC systems in dealing with rare words in the corpora. We decompose words into smaller constituent units 'subwords' and represent captions as a sequence of subwords instead of words. This helps represent all words in the corpora using a significantly lower subword vocabulary, leading to better parameter learning. Using subword language modeling, our captioning system improves various metric scores, with a training vocabulary size approximately 90% less than the baseline and various state-of-the-art word-level models. Our quantitative and qualitative results and analysis signify the efficacy of our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题