论文标题
mpnet:掩盖和排列的预训练以了解语言理解
MPNet: Masked and Permuted Pre-training for Language Understanding
论文作者
论文摘要
伯特(Bert)采用蒙版语言建模(MLM)进行预训练,并且是最成功的培训模型之一。由于BERT忽略了预测令牌之间的依赖性,因此XLNET引入了排序的语言建模(PLM)以进行预训练以解决此问题。但是,XLNET不利用句子的完整位置信息,因此遭受了训练和微调之间的位置差异。在本文中,我们提出了MPNet,这是一种新型的预训练方法,它继承了Bert和XLNet的优势并避免了它们的局限性。 MPNET通过排列的语言建模(与BERT中的MLM)利用预测令牌之间的依赖性,并以辅助位置信息作为输入来使模型看到完整的句子,从而减少了位置差异(vs. vs. vs. vs. vs. plm in xlnet中)。我们在大规模数据集(超过160GB的文本语料库)上预先培训MPNET,并对各种下流任务(胶水,小队等)进行微调。实验结果表明,MPNET的表现要优于MLM和PLM,并且与以前最先进的预训练的方法(例如Bert,Xlnet,Roberta)相比,在这些任务上取得了更好的结果。代码和预训练的模型可在以下网址提供:https://github.com/microsoft/mpnet。
BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. The code and the pre-trained models are available at: https://github.com/microsoft/MPNet.