具有文本和图形表示形式的分类学丰富

论文标题

具有文本和图形表示形式的分类学丰富

Taxonomy Enrichment with Text and Graph Vector Representations

论文作者

Nikishina, Irina, Tikhomirov, Mikhail, Logacheva, Varvara, Nazarov, Yuriy, Panchenko, Alexander, Loukachevitch, Natalia

论文摘要

诸如DBPEDIA，FREEBASE或WIKIDATA之类的知识图始终包含一个分类主链，该主链可以根据hypo-hypernym（“ class-sublclass”）关系来安排和结构各种概念。随着特定领域的词汇资源的快速增长，使用新单词自动扩展现有知识库的问题变得越来越普遍。在本文中，我们解决了分类富集的问题，旨在为现有的分类法添加新单词。我们提出了一种新方法，该方法几乎没有努力就可以在这项任务上取得很高的成果。它使用大多数语言存在的资源，使该方法通用。我们通过结合诸如Node2Vec，Poincaré嵌入，GCN等的图形结构的深度表示，扩展了方法，这些结构最近在各种NLP任务上都证明了有希望的结果。此外，将这些表示与单词嵌入结合起来，使我们能够击败最新技术。我们对基于单词和图形表示及其融合方法的现有分类法富集的方法进行了全面研究。我们还探索了使用深度学习体系结构扩展知识图的分类骨干的方式。我们为英语和俄语的分类扩展程序创建了许多数据集。我们在不同数据集中实现最新结果，并提供对错误的深入错误分析。

Knowledge graphs such as DBpedia, Freebase or Wikidata always contain a taxonomic backbone that allows the arrangement and structuring of various concepts in accordance with the hypo-hypernym ("class-subclass") relationship. With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread. In this paper, we address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy. We present a new method that allows achieving high results on this task with little effort. It uses the resources which exist for the majority of languages, making the method universal. We extend our method by incorporating deep representations of graph structures like node2vec, Poincaré embeddings, GCN etc. that have recently demonstrated promising results on various NLP tasks. Furthermore, combining these representations with word embeddings allows us to beat the state of the art. We conduct a comprehensive study of the existing approaches to taxonomy enrichment based on word and graph vector representations and their fusion approaches. We also explore the ways of using deep learning architectures to extend the taxonomic backbones of knowledge graphs. We create a number of datasets for taxonomy extension for English and Russian. We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题