论文标题
异构图对比度多视图学习
Heterogeneous Graph Contrastive Multi-view Learning
论文作者
论文摘要
受到对比度学习(CL)在计算机视觉和自然语言处理中的成功启发,已经开发出图形对比学习(GCL),以学习图形数据集中的判别节点表示。但是,GCL在异质信息网络(HINS)上的发展仍处于婴儿阶段。例如,目前尚不清楚如何在不实质上改变基本语义的情况下增强HINS,以及如何设计对比目标以充分捕获丰富的语义。此外,早期研究表明,CL遭受了采样偏差的损害,而常规的证据技术在经验上被证明是GCL不足的。如何减轻异质GCL的采样偏差是另一个重要问题。为了应对上述挑战,我们提出了一种新型的异质图对比度多视图学习(HGCML)模型。特别是,我们将Metapaths用作增强,以生成多个子图作为多视图,并提出一个对比目标,以最大程度地提高任何对Metapath引起的观点之间的相互信息。为了减轻采样偏见,我们进一步提出了一个积极的抽样策略,以通过共同考虑保留在每个Metapath视图上的语义和结构信息来明确选择每个节点的阳性。广泛的实验表明,HGCML在五个实际基准数据集上始终优于最先进的基线。
Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.