基于聚类的无监督生成关系提取

论文标题

基于聚类的无监督生成关系提取

Clustering-based Unsupervised Generative Relation Extraction

论文作者

Yuan, Chenhan, Rossi, Ryan, Katz, Andrew, Eldardiry, Hoda

论文摘要

本文着重于无监督关系提取的问题。现有的基于概率生成模型的关系提取方法通过提取句子特征并将这些功能作为输入来训练生成模型来起作用。然后，该模型用于聚集类似关系。但是，这些方法在训练过程中不考虑与实体对的句子之间的相关性，这可能会对模型性能产生负面影响。为了解决此问题，我们建议基于聚类的无监督生成关系提取（CURE）框架，该框架利用“编码器”架构来执行自我监督的学习，以便编码器可以提取关系信息。给定的多个具有与输入相同的实体对的句子，通过预测其中一个句子的依赖关系图上的实体对之间的最短路径来部署自我监督的学习。之后，我们使用训练有素的编码器提取关系信息。然后，根据其相应的关系信息将共享相同关系的实体对聚集。每个群集都根据与每个集群中实体对相对应的最短路径中的单词标记。这些集群标签还描述了这些关系簇的含义。我们比较了我们提出的框架（治疗）和基线方法与基础知识基础所提取的三胞胎。实验结果表明，我们的模型在《纽约时报》（NYT）和联合国并行语料库（UNPC）标准数据集上的性能要比最先进的模型更好。

This paper focuses on the problem of unsupervised relation extraction. Existing probabilistic generative model-based relation extraction methods work by extracting sentence features and using these features as inputs to train a generative model. This model is then used to cluster similar relations. However, these methods do not consider correlations between sentences with the same entity pair during training, which can negatively impact model performance. To address this issue, we propose a Clustering-based Unsupervised generative Relation Extraction (CURE) framework that leverages an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information. Given multiple sentences with the same entity pair as inputs, self-supervised learning is deployed by predicting the shortest path between entity pairs on the dependency graph of one of the sentences. After that, we extract the relation information using the well-trained encoder. Then, entity pairs that share the same relation are clustered based on their corresponding relation information. Each cluster is labeled with a few words based on the words in the shortest paths corresponding to the entity pairs in each cluster. These cluster labels also describe the meaning of these relation clusters. We compare the triplets extracted by our proposed framework (CURE) and baseline methods with a ground-truth Knowledge Base. Experimental results show that our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题