论文标题

依赖树的无偏和有效抽样

Unbiased and Efficient Sampling of Dependency Trees

论文作者

Stanojević, Miloš

论文摘要

依赖性语法的大多数计算模型由跨越树的分布组成。但是,大多数依赖性树库都要求每个有效的依赖树都有一个从根节点出来的单个边缘,这是一个不属于跨越树的定义的约束。因此,所有用于跨树的标准推理算法对于依赖树的推断都是次优的。 Zmigrod等。 (2021b)提出的用于采样的算法,并从依赖树分布中替换有或不替换,该算法包含单根约束。在本文中,我们表明,它们用于替代品采样的最快算法Wilson-RC实际上正在生产有偏见的样品,我们提供了两种无偏见的替代方法。此外,我们提出了两种算法(一种增量,一种平行),以减少用于对K树采样算法的渐近运行时,而无需替换为O(KN3)。这些算法在渐近和实际上更有效。

Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dependency trees. Zmigrod et al. (2021b) proposed algorithms for sampling with and without replacement from the dependency tree distribution that incorporate the single-root constraint. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of algorithm for sampling k trees without replacement to O(kn3). These algorithms are both asymptotically and practically more efficient.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源