论文标题
多语言BERT后期对齐
Multilingual BERT Post-Pretraining Alignment
论文作者
论文摘要
我们提出了一种简单的方法,将多种语言上下文嵌入作为后期的步骤,以提高预审预周化模型的零击跨语性可传递性。使用并行数据,我们的方法通过最近提出的翻译语言建模目标以及句子级别通过对比度学习和随机输入改组来使嵌入在单词级别上。在下游任务上进行填充时,我们还用英语执行句子级别的代码转换。在XNLI上,我们的最佳模型(从Mbert初始化)在零拍设置中对Mbert提高了4.7%,并获得了与XLM的可比较结果,用于Translate-Train,而使用少于相同的并行数据的18%,模型参数少31%。在MLQA上,我们的模型优于XLM-R_BASE的参数比我们的参数多57%。
We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved zero-shot cross-lingual transferability of the pretrained models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform sentence-level code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7% in the zero-shot setting and achieves comparable result to XLM for translate-train while using less than 18% of the same parallel data and 31% less model parameters. On MLQA, our model outperforms XLM-R_Base that has 57% more parameters than ours.