mgpt：很少有学习者多语言

论文标题

mgpt：很少有学习者多语言

mGPT: Few-Shot Learners Go Multilingual

论文作者

Shliazhko, Oleh, Fenogenova, Alena, Tikhonova, Maria, Mikhailov, Vladislav, Kozlova, Anastasia, Shavrina, Tatiana

论文摘要

最近的研究报告说，自回归语言模型可以通过零和少数学习范式成功地解决许多NLP任务，这为使用预训练的语言模型提供了新的可能性。本文介绍了两种自回归GPT型模型，使用Wikipedia和巨大的清洁爬行语料库，对25个语言系列的60种语言进行了13亿和130亿个参数。我们使用GPT-2源和稀疏注意机制重现GPT-3结构。 DeepSpeed和Megatron框架使我们能够有效地将培训和推理步骤平行。最终的模型显示了Facebook最近发布的XGLM模型的表现，涵盖了更多语言，并增强了独联体国家和俄罗斯小国低资源语言的NLP可能性。我们详细介绍了体系结构设计选择的动机，彻底描述了数据准备管道，并训练五个小型版本，以选择最佳的多语言代币化策略。我们在所有涵盖语言中测量模型的困惑，并在多语言任务的广泛幽灵上进行评估，包括分类，生成性，序列标签和知识探测。用零射门和少量方法评估模型。此外，我们将分类任务与最先进的多语言模型XGLM进行了比较。源代码和MGPT XL模型公开发布。

Recent studies report that autoregressive language models can successfully solve many NLP tasks via zero- and few-shot learning paradigms, which opens up new possibilities for using the pre-trained language models. This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism; Deepspeed and Megatron frameworks allow us to parallelize the training and inference steps effectively. The resulting models show performance on par with the recently released XGLM models by Facebook, covering more languages and enhancing NLP possibilities for low resource languages of CIS countries and Russian small nations. We detail the motivation for the choices of the architecture design, thoroughly describe the data preparation pipeline, and train five small versions of the model to choose the most optimal multilingual tokenization strategy. We measure the model perplexity in all covered languages and evaluate it on the wide spectre of multilingual tasks, including classification, generative, sequence labeling and knowledge probing. The models were evaluated with the zero-shot and few-shot methods. Furthermore, we compared the classification tasks with the state-of-the-art multilingual model XGLM. source code and the mGPT XL model are publicly released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题