神经单词对齐的第三方对齐器

论文标题

神经单词对齐的第三方对齐器

Third-Party Aligner for Neural Word Alignments

论文作者

Zhang, Jinpeng, Dong, Chuanqi, Duan, Xiangyu, Zhang, Yuqi, Zhang, Min

论文摘要

单词一致性是在源句子和目标句子之间找到翻译等效的单词。先前的工作表明，自我训练可以实现竞争性的单词一致性结果。在本文中，我们建议使用第三方单词对齐器产生的单词一致性来监督神经单词对齐训练。具体而言，第三方对准器对齐每个单词对的源单词和目标单词都经过培训，可以在上下文化的嵌入空间中彼此亲密，当微调预训练的预训练的跨语性语言模型时。各种语言对基准的实验表明，我们的方法可以通过找到更准确的单词对齐方式并删除错误的单词对齐方式来对第三方监督进行自我纠正，从而比各种第三方单词对准器（包括当前最佳的人）提高性能。当我们整合来自各个第三方对准器的所有监督时，我们就实现了最新的单词对齐性能，比最佳的第三方对准器相比，平均比对准错误率要低两个以上。我们在https://github.com/sdongchuanqi/third-party-supervised-aligner上发布了代码。

Word alignment is to find translationally equivalent words between source and target sentences. Previous work has demonstrated that self-training can achieve competitive word alignment results. In this paper, we propose to use word alignments generated by a third-party word aligner to supervise the neural word alignment training. Specifically, source word and target word of each word pair aligned by the third-party aligner are trained to be close neighbors to each other in the contextualized embedding space when fine-tuning a pre-trained cross-lingual language model. Experiments on the benchmarks of various language pairs show that our approach can surprisingly do self-correction over the third-party supervision by finding more accurate word alignments and deleting wrong word alignments, leading to better performance than various third-party word aligners, including the currently best one. When we integrate all supervisions from various third-party aligners, we achieve state-of-the-art word alignment performances, with averagely more than two points lower alignment error rates than the best third-party aligner. We released our code at https://github.com/sdongchuanqi/Third-Party-Supervised-Aligner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题