论文标题
关于使用人类参考数据评估自动图像描述
On the use of human reference data for evaluating automatic image descriptions
论文作者
论文摘要
自动图像描述系统通常使用众包,人类生成的图像描述对系统进行训练和评估。然后,使用与参考数据(BLEU,流星,苹果酒等)相似性的一定量度来确定表现最佳的系统。因此,系统的质量以及评估的质量都取决于描述的质量。如第2节所示,当前图像描述数据集的质量不足。我认为,需要更详细的指南来考虑视力受损的用户的需求,也需要产生合适的描述的可行性。借助高质量的数据,对图像描述系统的评估可以使用参考描述,但我们还应该寻找替代方案。
Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER, etc). Thus, both the quality of the systems as well as the quality of the evaluation depends on the quality of the descriptions. As Section 2 will show, the quality of current image description datasets is insufficient. I argue that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions. With high-quality data, evaluation of image description systems could use reference descriptions, but we should also look for alternatives.