论文标题

用于分层文本分类的约束序列到树的生成

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

论文作者

Yu, Chao, Shen, Yi, Mao, Yue, Cai, Longjun

论文摘要

分层文本分类(HTC)是一项艰巨的任务,可以将文档分配给分类法内的多个层次结构化类别。大多数先前的研究都将HTC视为一个平坦的多标签分类问题,这不可避免地导致“标签不一致”问题。在本文中,我们将HTC提出为序列生成任务,并引入了用于建模层次标签结构的序列到树框架(SEQ2TREE)。此外,我们使用动态词汇设计有限的解码策略,以确保结果的标签一致性。与以前的工作相比,所提出的方法在三个基准数据集上实现了显着和一致的改进。

Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源