分析图像字幕中多样性准确性的权衡

论文标题

分析图像字幕中多样性准确性的权衡

Analysis of diversity-accuracy tradeoff in image captioning

论文作者

Luo, Ruotian, Shakhnarovich, Gregory

论文摘要

我们研究了不同模型体系结构，训练目标，超参数设置和解码程序对自动生成的图像标题多样性的影响。我们的结果表明，1）通过幼稚采样的简单解码，与低温结合是一种竞争性和快速的方法，可以产生多样而准确的标题集； 2）使用强化学习对基于苹果酒的奖励进行培训会损害所得发电机的多样性属性，这不能通过操纵解码参数来减轻。此外，我们提出了一个新的度量千市场，以通过单个值评估一组字幕的准确性和多样性。

We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions. Our results show that 1) simple decoding by naive sampling, coupled with low temperature is a competitive and fast method to produce diverse and accurate caption sets; 2) training with CIDEr-based reward using Reinforcement learning harms the diversity properties of the resulting generator, which cannot be mitigated by manipulating decoding parameters. In addition, we propose a new metric AllSPICE for evaluating both accuracy and diversity of a set of captions by a single value.

下载PDF全文

下载文献需遵守相关版权规定

论文标题