具有跨语性位置表示的自我发作

论文标题

具有跨语性位置表示的自我发作

Self-Attention with Cross-Lingual Position Representation

论文作者

Ding, Liang, Wang, Longyue, Tao, Dacheng

论文摘要

位置编码（PE）是自我发项网络（SANS）的重要组成部分，用于保留自然语言处理任务的单词订单信息，为输入序列生成固定位置索引。但是，在跨语言场景中，例如机器翻译，源和目标句子的PES是独立建模的。由于单词顺序差异的不同语言，对跨语性的位置关系进行建模可能有助于解决这个问题。在本文中，我们使用\ emph {跨语性位置表示}来增强sans，以建模输入句子的双语意识潜在结构。具体而言，我们利用基于括号的转导语法（BTG）的重新排序信息来鼓励SANS学习双语对角线比对。 WMT'14英语$ \ rightArrow $ derman，Wat'17 Japanese $ \ rightarrow $英语和WMT'17中文$ \ leftrightArrow $英语翻译任务表明，我们的方法显着，一致地改善了与强基地的翻译质量相比。广泛的分析证实，绩效增长来自跨语义信息。

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, e.g. machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with \emph{cross-lingual position representations} to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT'14 English$\Rightarrow$German, WAT'17 Japanese$\Rightarrow$English, and WMT'17 Chinese$\Leftrightarrow$English translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题