论文标题
FAST-R2D2:基于修剪的CKY语法诱导和文本表示的预验证的递归神经网络
Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation
论文作者
论文摘要
最近,由于其类似人类的编码范式,基于CKY的模型在无监督的语法诱导中表现出了巨大的潜力,该编码范式在递归和层次上运行,但需要$ O(n^3)$ $ o(n^3)$时间复杂性。基于可区分的树(R2D2)的递归变压器即使通过引入启发式修剪方法,即使使用复杂的树编码器进行复杂的树编码器,也可以扩展到大型语言模型。但是,基于规则的修剪方法患有局部最佳和缓慢的推理问题。在本文中,我们以统一的方法来解决这些问题。我们建议将自上而下的解析器用作基于模型的修剪方法,该方法还可以在推理过程中进行并行编码。通常,我们的解析器将解析作为一个分级评分任务,该任务首先为给定句子分数所有分分点,然后通过在当前跨度中以最高分数选择一个分分来递归将跨度分为两个。拆分的相反顺序被认为是R2D2编码器中修剪的顺序。除了双向语言模型丢失外,我们还通过最大程度地降低解析器和R2D2的树概率之间的KL距离来优化解析器。我们的实验表明,我们的FAST-R2D2在语法诱导方面显着提高了性能,并在下游分类任务中取得了竞争性的结果。
Recently CKY-based models show great potential in unsupervised grammar induction thanks to their human-like encoding paradigm, which runs recursively and hierarchically, but requires $O(n^3)$ time-complexity. Recursive Transformer based on Differentiable Trees (R2D2) makes it possible to scale to large language model pre-training even with complex tree encoder by introducing a heuristic pruning method. However, the rule-based pruning approach suffers from local optimum and slow inference issues. In this paper, we fix those issues in a unified method. We propose to use a top-down parser as a model-based pruning method, which also enables parallel encoding during inference. Typically, our parser casts parsing as a split point scoring task, which first scores all split points for a given sentence, and then recursively splits a span into two by picking a split point with the highest score in the current span. The reverse order of the splits is considered as the order of pruning in R2D2 encoder. Beside the bi-directional language model loss, we also optimize the parser by minimizing the KL distance between tree probabilities from parser and R2D2. Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and achieves competitive results in downstream classification tasks.