伯特之后的生活：其他木偶对语言有什么了解？

论文标题

伯特之后的生活：其他木偶对语言有什么了解？

Life after BERT: What do Other Muppets Understand about Language?

论文作者

Lialin, Vladislav, Zhao, Kevin, Shivagunde, Namrata, Rumshisky, Anna

论文摘要

现有的预训练的变压器分析工作通常只关注一个或两个模型系列，忽略了体系结构和训练预训练目标的可变性。在我们的工作中，我们利用Olmpics的基准和心理语言探测数据集，为包括T5，Bart和Albert在内的29种型号组合。此外，我们将Olmpics零摄像机设置调整为自回归模型，并评估不同尺寸的GPT网络。我们的发现表明，这些模型都无法以零拍的方式解决构图问题，这表明使用现有的培训预培训目标无法学习此技能。此外，我们发现诸如体系结构，方向性，数据集大小和训练目标之类的全球模型决策无法预测模型的语言能力。

Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive models and evaluate GPT networks of different sizes. Our findings show that none of these models can resolve compositional questions in a zero-shot fashion, suggesting that this skill is not learnable using existing pre-training objectives. Furthermore, we find that global model decisions such as architecture, directionality, size of the dataset, and pre-training objective are not predictive of a model's linguistic capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题