Xtreme：用于评估跨语性概括的大量多语言多任务基准

论文标题

Xtreme：用于评估跨语性概括的大量多语言多任务基准

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

论文作者

Hu, Junjie, Ruder, Sebastian, Siddhant, Aditya, Neubig, Graham, Firat, Orhan, Johnson, Melvin

论文摘要

机器学习模型对NLP的应用的最新进展是由评估各种任务的模型的基准驱动的。但是，这些宽覆盖的基准主要仅限于英语，尽管对多语言模型的兴趣越来越多，但一种基准，可以在各种语言和任务上对此类方法进行全面评估。为此，我们介绍了多语言编码器Xtreme基准测试的跨语性转移评估，这是一种多任务基准，用于评估40种语言和9个任务的多语言表示的跨语性概括功能。我们证明，虽然对英语的模型在许多任务上进行了人类绩效，但跨语言转移模型的性能仍然存在很大的差距，尤其是在句法和句子检索任务上。在语言中也有广泛的结果。我们发布基准，以鼓励对跨语性学习方法的研究，这些方法将语言知识转移到多样化和代表性的语言和任务中。

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题