图像字幕作为辅助技术：从Vizwiz 2020 Challenge中汲取的教训

论文标题

图像字幕作为辅助技术：从Vizwiz 2020 Challenge中汲取的教训

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

论文作者

Dognin, Pierre, Melnyk, Igor, Mroueh, Youssef, Padhi, Inkit, Rigotti, Mattia, Ross, Jarret, Schiff, Yair, Young, Richard A., Belgodere, Brian

论文摘要

由于引入了在MS-Coco等策划数据集中训练的神经网络算法，因此图像字幕最近显示出令人印象深刻的进展。通常，在该领域的工作是由在实际应用中部署字幕系统的承诺而激发的。但是，许多竞争数据集中的数据和上下文的稀缺性使在这些数据集中受过培训的系统的实用性有限为现实世界中的辅助技术，例如帮助视觉上的障碍者导航和完成日常任务。这一差距激发了新颖的Vizwiz数据集的引入，该数据集由视力障碍和字幕带有有用的，以任务为导向的信息的图像组成。为了帮助机器学习计算机视野实现其产生具有积极社会影响的技术的希望，Vizwiz数据集的策展人举办了几场比赛，其中包括一场用于图像字幕。这项工作详细介绍了从我们的获胜提交到2020年字幕竞赛的理论和工程。我们的工作为改善辅助图像字幕系统提供了一步。

Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题