训练预训练多语言模型的语义知识的多级蒸馏

论文标题

训练预训练多语言模型的语义知识的多级蒸馏

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

论文作者

Li, Mingqi, Ding, Fei, Zhang, Dan, Cheng, Long, Hu, Hongxin, Luo, Feng

论文摘要

预训练的多语言模型在跨语性自然语言理解任务中起着重要作用。但是，现有方法并未专注于学习表示的语义结构，因此无法优化其性能。在本文中，我们提出了多语言知识蒸馏（MMKD），这是一种改善多语言模型的新方法。具体来说，我们采用教师学生框架来采用英语伯特的丰富语义表示知识。我们提出了令牌，文字，句子和结构级别对齐目标，以鼓励源目标对之间的多个级别的一致性以及教师和学生模型之间的相似性。我们对包括XNLI，PAWS-X和Xquad在内的跨语言评估基准进行实验。实验结果表明，MMKD的表现优于XNLI和Xquad上相似大小的其他基线模型，并在PAWS-X上获得了可比的性能。特别是，MMKD在低资源语言上获得了显着的性能增长。

Pre-trained multilingual language models play an important role in cross-lingual natural language understanding tasks. However, existing methods did not focus on learning the semantic structure of representation, and thus could not optimize their performance. In this paper, we propose Multi-level Multilingual Knowledge Distillation (MMKD), a novel method for improving multilingual language models. Specifically, we employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT. We propose token-, word-, sentence-, and structure-level alignment objectives to encourage multiple levels of consistency between source-target pairs and correlation similarity between teacher and student models. We conduct experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD. Experimental results show that MMKD outperforms other baseline models of similar size on XNLI and XQuAD and obtains comparable performance on PAWS-X. Especially, MMKD obtains significant performance gains on low-resource languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题