语言结构的信息理论探测

论文标题

语言结构的信息理论探测

Information-Theoretic Probing for Linguistic Structure

论文作者

Pimentel, Tiago, Valvoda, Josef, Maudslay, Rowan Hall, Zmigrod, Ran, Williams, Adina, Cotterell, Ryan

论文摘要

神经网络在各种NLP任务上的成功使研究人员质疑这些网络实际上是关于自然语言的多少。探针是评估这一点的自然方法。探测时，研究人员选择了语言任务，并训练一个监督模型，以预测网络学会的表示的语言任务中的注释。如果探测器表现良好，研究人员可以得出结论，表示与任务相关的知识。一个普遍认为的信念是，将更简单的模型用作探针更好。逻辑是，更简单的模型将识别语言结构，但不能学习任务本身。我们提出了探测的信息理论操作，作为估计与此智慧相矛盾的相互信息的估计：即使它更复杂，也应始终选择最高的性能探测器，因为它将导致更严格的估计，从而揭示了代表中固有的语言信息。我们论文的实验部分着重于从经验上估算语言特性和BERT之间的相互信息，将这些估计值与几个基线进行了比较。我们评估了NLP研究中经常代表的十种类型多样性的语言 - 加上英语 - 总计11种语言。

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually ``know'' about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research---plus English---totalling eleven languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题