论文标题
用串行繁殖链探测伯特的先验
Probing BERT's priors with serial reproduction chains
论文作者
论文摘要
抽样是一种有希望的自下而上的方法,用于揭示生成模型对语言的了解,但是尚不清楚如何从像Bert这样的流行蒙版语言模型(MLMS)中生成代表性样本。 MLM目标产生一个依赖性网络,无法保证一致的条件分布,从而为幼稚的方法带来了问题。从认知科学中的迭代学习理论中汲取灵感,我们探索了串行繁殖链的使用来从伯特的先验中进行采样。特别是,我们观察到,唯一且一致的接地接头分布的估计器由生成随机网络(GSN)采样器给出,该采样器随机选择了在每个步骤上掩盖和重建的代币。我们表明,来自GSN链的句子的词汇和句法统计数据与地面语料库的分布非常匹配,并且在大量自然判断中的其他方法表现更好。我们的发现为自下而上的探测建立了一个更牢固的理论基础,并突出了与人类先验的更丰富的偏差。
Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT's priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than other methods in a large corpus of naturalness judgments. Our findings establish a firmer theoretical foundation for bottom-up probing and highlight richer deviations from human priors.