论文标题
ICAP:带有预测文本的交互式图像字幕
iCap: Interactive Image Captioning with Predictive Text
论文作者
论文摘要
在本文中,我们研究了与人类在循环中与人类的交互式图像字幕的全新主题。不同于自动图像字幕,其中给定的测试图像是推理阶段的唯一输入,我们可以访问测试图像和(不完整的)用户输入句子在交互式方案中。我们将问题提出为视觉条件的句子完成(VCSC)。对于VCSC,我们提出了对图像标题完成(ABD-CAP)的异步双向解码。以ABD-CAP为核心模块,我们构建了ICAP,这是一个基于Web的交互式图像字幕系统,能够预测用户实时输入的新文本。涵盖自动化评估和实际用户研究的许多实验表明我们的建议的生存能力。
In this paper we study a brand new topic of interactive image captioning with human in the loop. Different from automated image captioning where a given test image is the sole input in the inference stage, we have access to both the test image and a sequence of (incomplete) user-input sentences in the interactive scenario. We formulate the problem as Visually Conditioned Sentence Completion (VCSC). For VCSC, we propose asynchronous bidirectional decoding for image caption completion (ABD-Cap). With ABD-Cap as the core module, we build iCap, a web-based interactive image captioning system capable of predicting new text with respect to live input from a user. A number of experiments covering both automated evaluations and real user studies show the viability of our proposals.