论文标题
用于分层文本分类的约束序列到树的生成
Constrained Sequence-to-Tree Generation for Hierarchical Text Classification
论文作者
论文摘要
分层文本分类(HTC)是一项艰巨的任务,可以将文档分配给分类法内的多个层次结构化类别。大多数先前的研究都将HTC视为一个平坦的多标签分类问题,这不可避免地导致“标签不一致”问题。在本文中,我们将HTC提出为序列生成任务,并引入了用于建模层次标签结构的序列到树框架(SEQ2TREE)。此外,我们使用动态词汇设计有限的解码策略,以确保结果的标签一致性。与以前的工作相比,所提出的方法在三个基准数据集上实现了显着和一致的改进。
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.