论文标题
在变压器模型中找到专家
Finding Experts in Transformer Models
论文作者
论文摘要
在这项工作中,我们研究了预训练的变压器模型(TM)中的专家单位,以及它们如何影响模型的性能。我们将专家单元定义为能够用给定平均精度对概念进行分类的神经元,其中一个概念由包含概念的二进制句子表示(或不)。利用Onesec数据集(Scarlini等,2019),我们编译了1641个概念的数据集,该数据集允许发现TM中的各种专家单位。我们表明,专家单元在几种方面很重要:(1)专家单位的存在与TM的概括功率相关联($ r^2 = 0.833 $),这允许对TM进行排名,而无需对下游任务的套件进行微调。我们进一步提出了一种经验方法,以确定此类专家应如何评估概括的准确性。 (2)概念之间的顶级专家的重叠提供了一种量化概念共同学习的明智方法,可用于解释未知概念。 (3)我们展示了如何通过强迫顶级专家处于活动状态,而无需重新训练该模型或使用其他参数来展示如何使用给定概念来生成具有给定概念的文本。
In this work we study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance. We define expert units to be neurons that are able to classify a concept with a given average precision, where a concept is represented by a binary set of sentences containing the concept (or not). Leveraging the OneSec dataset (Scarlini et al., 2019), we compile a dataset of 1641 concepts that allows diverse expert units in TM to be discovered. We show that expert units are important in several ways: (1) The presence of expert units is correlated ($r^2=0.833$) with the generalization power of TM, which allows ranking TM without requiring fine-tuning on suites of downstream tasks. We further propose an empirical method to decide how accurate such experts should be to evaluate generalization. (2) The overlap of top experts between concepts provides a sensible way to quantify concept co-learning, which can be used for explainability of unknown concepts. (3) We show how to self-condition off-the-shelf pre-trained language models to generate text with a given concept by forcing the top experts to be active, without requiring re-training the model or using additional parameters.