论文标题
建立灵活的嵌入学习框架
Towards a Flexible Embedding Learning Framework
论文作者
论文摘要
表示学习是分析数据库中实体的基本构建基础。尽管现有的嵌入学习方法在各种数据挖掘问题中有效,但它们的适用性通常受到限制,因为这些方法对学到的嵌入式捕获的语义类型具有预定的假设,并且假设可能与特定的下游任务不符。在这项工作中,我们提出了一个嵌入学习框架,即1)使用对输入数据类型不可知的输入格式,2)在可以嵌入到学习的表示中的关系方面是灵活的,而3)提供了将域知识纳入嵌入学习过程中的直觉途径。我们提出的框架利用一组实体 - 依次 - 纳塔作为输入,从而量化了数据库中不同实体之间的亲和力。此外,仔细设计了采样机制,以在输入和输出嵌入捕获的信息之间建立直接连接。为了完成表示形式学习工具箱,我们还概述了一种简单而有效的后处理技术,以正确地可视化学习的嵌入。我们的经验结果表明,所提出的框架与一组相关的实体 - 缔约量 - 局部相结合,在各种数据挖掘任务中的现有最新方法都优于现有的最新方法。
Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstream tasks. In this work, we propose an embedding learning framework that 1) uses an input format that is agnostic to input data type, 2) is flexible in terms of the relationships that can be embedded into the learned representations, and 3) provides an intuitive pathway to incorporate domain knowledge into the embedding learning process. Our proposed framework utilizes a set of entity-relation-matrices as the input, which quantifies the affinities among different entities in the database. Moreover, a sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings. To complete the representation learning toolbox, we also outline a simple yet effective post-processing technique to properly visualize the learned embeddings. Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks.