与视觉上下文同时翻译

论文标题

与视觉上下文同时翻译

Simultaneous Machine Translation with Visual Context

论文作者

Caglayan, Ozan, Ive, Julia, Haralampieva, Veneta, Madhyastha, Pranava, Barrault, Loïc, Specia, Lucia

论文摘要

同时的机器翻译（SIMT）旨在将连续的输入文本流转换为另一种语言，具有最低的延迟和最高质量。因此，翻译必须从不完整的源文本开始，该文本逐渐阅读，从而产生了预期的需求。在本文中，我们试图了解添加视觉信息是否可以弥补缺失的源环境。为此，我们分析了不同多模式方法和视觉特征对最新SIMT框架的影响。我们的结果表明，视觉上下文是有帮助的，并且基于显式对象区域信息的视觉界面模型比常用的全局特征要好得多，在低延迟方案下最多可提高3个BLEU点。我们的定性分析说明了只有多模式系统才能将英语转换为性别标记的语言以及处理单词顺序的差异的情况，例如英语和法语之间的形容词 - 名词放置。

Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible. The translation thus has to start with an incomplete source text, which is read progressively, creating the need for anticipation. In this paper, we seek to understand whether the addition of visual information can compensate for the missing source context. To this end, we analyse the impact of different multimodal approaches and visual features on state-of-the-art SiMT frameworks. Our results show that visual context is helpful and that visually-grounded models based on explicit object region information are much better than commonly used global features, reaching up to 3 BLEU points improvement under low latency scenarios. Our qualitative analysis illustrates cases where only the multimodal systems are able to translate correctly from English into gender-marked languages, as well as deal with differences in word order, such as adjective-noun placement between English and French.

下载PDF全文

下载文献需遵守相关版权规定

论文标题