双重监督框架，用于提取与遥远的监督和人类注释

论文标题

双重监督框架，用于提取与遥远的监督和人类注释

Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation

论文作者

Jung, Woohwan, Shim, Kyuseok

论文摘要

关系提取（RE）由于其在现实世界应用中的重要性，例如知识库构建和问题答案，因此对关系提取（RE）进行了广泛的研究。现有的大多数作品都会根据远处监督的数据或人为宣传的数据训练模型。为了利用人类注释的高度准确性和远处监督的廉价成本，我们提出了双重监督框架，该框架有效地利用了两种类型的数据。但是，仅将两种类型的数据组合起来训练RE模型可能会降低预测准确性，因为遥远的监督具有标记偏差。我们使用两个独立的预测网络HA-NET和DS-NET分别通过人类注释和遥远的监督来预测标签，以防止通过不正确的远处监督标记来降低准确性。此外，我们提出了一个额外的损失术语，称为分歧罚款，以使HA-NET能够从遥远的监督标签中学习。此外，我们利用其他网络来通过考虑上下文信息来自适应地评估标签偏差。我们对句子级和文档级别RES的绩效研究确认了双重监督框架的有效性。

Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题