论文标题
从零到英雄:关于用多语言变压器的零拍传输的局限性
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
论文作者
论文摘要
通过语言建模目标(例如,Mbert,XLM-R)预测的大规模多语言变压器已成为NLP中零摄像的跨语言转移的事实上的默认传输范式,提供了无与伦比的传输性能。然而,当前的下游评估主要在涉及语言的传输设置中验证其功效,并具有足够数量的预处理数据以及词汇和类型上的近语。在这项工作中,我们分析了它们的局限性,并表明通过大量多语言变压器进行跨语性转移,就像通过跨语性单词嵌入的转移一样,在资源与涉及的情景和遥远语言中的有效性要小得多。我们的实验包括三个低级任务(POS标记,依赖性解析,NER),以及两个高级语义任务(NLI,QA),经验上将传递性能与源和目标语言之间的语言相似性相关联,但也与目标语言的识别型公司的大小相关联。我们还展示了廉价的几次转移(即,在源头进行微调之后的一些目标语言实例进行微调)的有效性。这表明应投入更多的研究工作,以超越有限的零拍摄条件。
Massively multilingual transformers pretrained with language modeling objectives (e.g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP, offering unmatched transfer performance. Current downstream evaluations, however, verify their efficacy predominantly in transfer settings involving languages with sufficient amounts of pretraining data, and with lexically and typologically close languages. In this work, we analyze their limitations and show that cross-lingual transfer via massively multilingual transformers, much like transfer via cross-lingual word embeddings, is substantially less effective in resource-lean scenarios and for distant languages. Our experiments, encompassing three lower-level tasks (POS tagging, dependency parsing, NER), as well as two high-level semantic tasks (NLI, QA), empirically correlate transfer performance with linguistic similarity between the source and target languages, but also with the size of pretraining corpora of target languages. We also demonstrate a surprising effectiveness of inexpensive few-shot transfer (i.e., fine-tuning on a few target-language instances after fine-tuning in the source) across the board. This suggests that additional research efforts should be invested to reach beyond the limiting zero-shot conditions.