重新访问强大的神经机器翻译：变压器案例研究

论文标题

重新访问强大的神经机器翻译：变压器案例研究

Revisiting Robust Neural Machine Translation: A Transformer Case Study

论文作者

Passban, Peyman, Saladi, Puneeth S. M., Liu, Qun

论文摘要

变形金刚（Vaswani等人，2017年）在神经机器翻译（NMT）系统的性能方面取得了显着改善，但它们可能非常容易受到噪声的影响。在这项工作中，我们试图研究噪音如何破坏变形金刚以及是否存在解决此类问题的解决方案。 NMT文献中有大量作品用于分析噪声问题的常规模型的行为，但在这种情况下，变压器相对研究。以此为动机，我们引入了一种新型的数据驱动技术，称为目标增强微调（TAFT），以在训练过程中融合噪声。这个想法与众所周知的微调策略相媲美。此外，我们向原始变压器提出了另外两个新型扩展：受控的DeNoising（CD）和双通道解码（DCD），以修改神经体系结构以及处理噪声的训练过程。我们技术的一个重要特征是它们仅影响训练阶段，并且在推理时不会施加任何开销。我们评估了我们的技术来翻译英语 - 在这两个方向上，并且观察到我们的模型对噪声具有更高的耐受性。更具体地说，它们的执行且没有恶化，其中多达10％的整个测试单词都被噪声感染。

Transformers (Vaswani et al., 2017) have brought a remarkable improvement in the performance of neural machine translation (NMT) systems but they could be surprisingly vulnerable to noise. In this work, we try to investigate how noise breaks Transformers and if there exist solutions to deal with such issues. There is a large body of work in the NMT literature on analyzing the behavior of conventional models for the problem of noise but Transformers are relatively understudied in this context. Motivated by this, we introduce a novel data-driven technique called Target Augmented Fine-tuning (TAFT) to incorporate noise during training. This idea is comparable to the well-known fine-tuning strategy. Moreover, we propose two other novel extensions to the original Transformer: Controlled Denoising (CD) and Dual-Channel Decoding (DCD), that modify the neural architecture as well as the training process to handle noise. One important characteristic of our techniques is that they only impact the training phase and do not impose any overhead at inference time. We evaluated our techniques to translate the English--German pair in both directions and observed that our models have a higher tolerance to noise. More specifically, they perform with no deterioration where up to 10% of entire test words are infected by noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题