KGCLEAN：嵌入动力知识图清洁框架

论文标题

KGCLEAN：嵌入动力知识图清洁框架

KGClean: An Embedding Powered Knowledge Graph Cleaning Framework

论文作者

Ge, Congcong, Gao, Yunjun, Weng, Honghui, Zhang, Chong, Miao, Xiaoye, Zheng, Baihua

论文摘要

知识图的质量保证是各种知识驱动应用程序的先决条件。我们提出了KGCLEAN，这是一个由知识图嵌入提供动力的新颖清洁框架，以检测和修复异质的肮脏数据。与以前的方法相比，专注于填写丢失的数据或清洁错误违反了有限规则，KGCLEAN启用（i）清洁缺失的数据和其他错误值，以及（ii）自动挖掘潜在规则，从而扩大了错误检测到错误的覆盖范围。 KGCLEAN首先通过Transgat学习数据表示，这是一个有效的知识图嵌入模型，该模型收集了每个数据的邻域信息，并将铸造数据的数据之间的交互结合到具有丰富语义的连续矢量空间。 KGCLEAN集成了一个基于活跃的学习分类模型，该模型可以用标签的小种子标识错误。 KGCLEAN利用一种新颖的传播能力概念来利用有效的亲修改策略来修复错误。在四个典型知识图上进行的广泛实验证明了KGClean在实践中的有效性。

The quality assurance of the knowledge graph is a prerequisite for various knowledge-driven applications. We propose KGClean, a novel cleaning framework powered by knowledge graph embedding, to detect and repair the heterogeneous dirty data. In contrast to previous approaches that either focus on filling missing data or clean errors violated limited rules, KGClean enables (i) cleaning both missing data and other erroneous values, and (ii) mining potential rules automatically, which expands the coverage of error detecting. KGClean first learns data representations by TransGAT, an effective knowledge graph embedding model, which gathers the neighborhood information of each data and incorporates the interactions among data for casting data to continuous vector spaces with rich semantics. KGClean integrates an active learning-based classification model, which identifies errors with a small seed of labels. KGClean utilizes an efficient PRO-repair strategy to repair errors using a novel concept of propagation power. Extensive experiments on four typical knowledge graphs demonstrate the effectiveness of KGClean in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题