论文标题
编码器decoder是否用于神经机器翻译?
Is Encoder-Decoder Redundant for Neural Machine Translation?
论文作者
论文摘要
编码器架构被广泛用于序列到序列建模任务。对于机器翻译,尽管从长期的短期存储网络到变压器网络发展,再加上注意机制的引入和开发,但编码器仍然是最新模型的事实上的神经网络体系结构。虽然从某个隐藏空间中解码信息的动机很简单,但编码和解码步骤的严格分离到编码器中,模型体系结构中的解码器不一定是必须的。与目标语言中自回旋语言建模的任务相比,机器翻译仅具有额外的源句子作为上下文。鉴于如今的神经语言模型已经可以在目标语言中处理相当长的上下文,因此很自然地询问是否简单地将源句子和目标句子串联以及训练语言模型进行翻译会起作用。在这项工作中,我们研究了机器翻译的上述概念。具体而言,我们尝试双语翻译,具有其他目标单语言数据的翻译以及多语言翻译。在所有情况下,这种替代方法均与基线编码器 - 模块变压器相同,这表明编码器decoder架构对于神经机器的翻译可能是多余的。
Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information from some hidden space is straightforward, the strict separation of the encoding and decoding steps into an encoder and a decoder in the model architecture is not necessarily a must. Compared to the task of autoregressive language modeling in the target language, machine translation simply has an additional source sentence as context. Given the fact that neural language models nowadays can already handle rather long contexts in the target language, it is natural to ask whether simply concatenating the source and target sentences and training a language model to do translation would work. In this work, we investigate the aforementioned concept for machine translation. Specifically, we experiment with bilingual translation, translation with additional target monolingual data, and multilingual translation. In all cases, this alternative approach performs on par with the baseline encoder-decoder Transformer, suggesting that an encoder-decoder architecture might be redundant for neural machine translation.