审计的多语言模型是否同样公平？

论文标题

审计的多语言模型是否同样公平？

Are Pretrained Multilingual Models Equally Fair Across Languages?

论文作者

Piqueras, Laura Cabello, Søgaard, Anders

论文摘要

预审计的多语言模型可以帮助桥接数字语言鸿沟，从而为较低资源的语言提供高质量的NLP模型。迄今为止，多语言模型的研究集中在性能，一致性和跨语性概括上。但是，借助其在野生和下游社会影响中的广泛应用，将多语言模型与单语模型相同的审查非常重要。这项工作调查了多语言模型的群体公平性，询问这些模型是否同样公平。为此，我们创建了一个新的四向多语言数据集的并行披肩测试示例（莫扎特），该数据集配备了有关测试参与者的人口统计信息（与性别和本地舌头平衡）。我们在莫扎特（Mbert），XLM-R和MT5上评估了三种多语言模型，并表明在四种目标语言中，这三种模型表现出不同水平的群体差异，例如，西班牙语的差异差不多，但对德语的差异很高。

Pretrained multilingual language models can help bridge the digital language divide, enabling high-quality NLP models for lower resourced languages. Studies of multilingual models have so far focused on performance, consistency, and cross-lingual generalisation. However, with their wide-spread application in the wild and downstream societal impact, it is important to put multilingual models under the same scrutiny as monolingual models. This work investigates the group fairness of multilingual models, asking whether these models are equally fair across languages. To this end, we create a new four-way multilingual dataset of parallel cloze test examples (MozArt), equipped with demographic information (balanced with regard to gender and native tongue) about the test participants. We evaluate three multilingual models on MozArt -- mBERT, XLM-R, and mT5 -- and show that across the four target languages, the three models exhibit different levels of group disparity, e.g., exhibiting near-equal risk for Spanish, but high levels of disparity for German.

下载PDF全文

下载文献需遵守相关版权规定

论文标题