论文标题

一个统一的正面未标记的学习框架,用于具有不同级别的标签的文档级别关系提取

A Unified Positive-Unlabeled Learning Framework for Document-Level Relation Extraction with Different Levels of Labeling

论文作者

Wang, Ye, Liu, Xinxin, Hu, Wenxin, Zhang, Tao

论文摘要

文档级关系提取(RE)旨在识别多个句子的实体之间的关系。大多数以前的方法都集中在文档级别的全部监督下。但是,在现实世界中,完全将文档中的所有关系标记为昂贵且难以将其标记为文档级中的实体对数,随着实体数量的数量二次增长。为了解决常见的不完整标签问题,我们提出了一个统一的积极未标记的学习框架 - 移位和平方排名损失阳性未标记(SSR-PU)学习。我们首次在文档级别上使用积极的未标记(PU)学习。考虑到数据集的标记数据可能会导致未标记数据的事先转移,因此我们在培训数据的事先转移下引入了PU学习。同样,使用无级分数作为适应性阈值,我们提出了平方排名损失,并证明其与多标签排名指标的贝叶斯一致性。广泛的实验表明,我们的方法相对于先前的基线而具有不完整的标记,我们的方法可提高约14 f1点。此外,在完全监督和极为未标记的设置下,它的表现都优于以前的最先进结果。

Document-level relation extraction (RE) aims to identify relations between entities across multiple sentences. Most previous methods focused on document-level RE under full supervision. However, in real-world scenario, it is expensive and difficult to completely label all relations in a document because the number of entity pairs in document-level RE grows quadratically with the number of entities. To solve the common incomplete labeling problem, we propose a unified positive-unlabeled learning framework - shift and squared ranking loss positive-unlabeled (SSR-PU) learning. We use positive-unlabeled (PU) learning on document-level RE for the first time. Considering that labeled data of a dataset may lead to prior shift of unlabeled data, we introduce a PU learning under prior shift of training data. Also, using none-class score as an adaptive threshold, we propose squared ranking loss and prove its Bayesian consistency with multi-label ranking metrics. Extensive experiments demonstrate that our method achieves an improvement of about 14 F1 points relative to the previous baseline with incomplete labeling. In addition, it outperforms previous state-of-the-art results under both fully supervised and extremely unlabeled settings as well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源