学习从多模式错误中恢复非自动回归神经机器翻译

论文标题

学习从多模式错误中恢复非自动回归神经机器翻译

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

论文作者

Ran, Qiu, Lin, Yankai, Li, Peng, Zhou, Jie

论文摘要

非入学的神经机器翻译（NAT）同时且显着加速了推理过程。但是，NAT会在句子中放弃依赖性信息，因此不可避免地遇到了多模式问题：目标令牌可以由不同的可能翻译提供，通常会导致令牌重复或丢失。为了减轻这个问题，我们提出了一种新型的半自动回归模型在这项工作中恢复，该模型将翻译作为一系列段的序列。这些细分是同时生成的，同时预测每个段逐个图表。通过动态确定段长度和删除重复段，恢复能够从重复和缺失的令牌错误中恢复。三个广泛使用的基准数据集的实验结果表明，与相应的自回归模型相比，我们所提出的模型在保持可比较性能的同时保持了4 $ \ times $速度。

Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process. However, NAT discards the dependency information in a sentence, and thus inevitably suffers from the multi-modality problem: the target tokens may be provided by different possible translations, often causing token repetitions or missing. To alleviate this problem, we propose a novel semi-autoregressive model RecoverSAT in this work, which generates a translation as a sequence of segments. The segments are generated simultaneously while each segment is predicted token-by-token. By dynamically determining segment length and deleting repetitive segments, RecoverSAT is capable of recovering from repetitive and missing token errors. Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4$\times$ speedup while maintaining comparable performance compared with the corresponding autoregressive model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题