论文标题
非自动入学的机器翻译,带有分离的上下文变压器
Non-Autoregressive Machine Translation with Disentangled Context Transformer
论文作者
论文摘要
最先进的神经机器翻译模型从左到右产生了翻译,每个步骤都在先前生成的令牌上进行条件。这一代过程的顺序性质会导致推理的基本潜伏期,因为我们不能同时生成每个句子中的多个令牌。我们提出了一个基于注意力掩盖的模型,称为Distangled Context(Disco)变压器,该模型同时生成了给定不同上下文的所有令牌。训练迪斯科变压器以预测给定其他参考令牌的任意子集预测每个输出令牌。我们还开发了平行的易于限制推理算法,该算法在迭代中并行地完善了每个令牌,并减少了所需的迭代次数。我们在7个翻译方向上进行的广泛的实验表明,与非自动进取的机器翻译中的最先进的状态相比,我们的模型具有竞争性,即使不是更好的性能,同时平均会大大减少解码时间。我们的代码可在https://github.com/facebookresearch/disco上找到。
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo) transformer, that simultaneously generates all tokens given different contexts. The DisCo transformer is trained to predict every output token given an arbitrary subset of the other reference tokens. We also develop the parallel easy-first inference algorithm, which iteratively refines every token in parallel and reduces the number of required iterations. Our extensive experiments on 7 translation directions with varying data sizes demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in non-autoregressive machine translation while significantly reducing decoding time on average. Our code is available at https://github.com/facebookresearch/DisCo.