论文标题
图表上的自我监督学习:深刻见解和新方向
Self-supervised Learning on Graphs: Deep Insights and New Direction
论文作者
论文摘要
众所周知,深度学习的成功需要大量昂贵的注释数据。这导致了自我监督学习(SSL)的发展,旨在通过在未标记的数据上创建特定的借口任务来减轻此限制。同时,以图神经网络(GNN)形式将深度学习概括为图形域的兴趣越来越多。 GNN自然可以通过简单的邻域聚集来利用未标记的节点,该节点无法彻底利用未标记的节点。因此,我们试图利用SSL,以使GNN完全利用未标记的数据。与图像和文本域中的数据实例不同,图中的节点呈现独特的结构信息,并且它们固有地链接,表明不是独立的,并且分布相同(或I.I.D.)。这种复杂性是图形上SSL的双刃剑。一方面,它确定从图像和文本域中采用解决方案到图形和需要专门的努力是具有挑战性的。另一方面,它提供了丰富的信息,使我们能够从各种角度构建SSL。因此,在本文中,我们首先加深了对何时,原因以及哪些SSL策略与GNN合作的理解,通过经验研究图表上的许多基本SSL借口任务。受经验研究的深刻见解的启发,我们提出了一个新的方向自由处理,以构建能够在各种现实世界数据集中实现最先进的性能的高级借口任务。可以在\ url {https://github.com/chandlerbang/selftask-gnn}中找到重现我们结果的特定实验设置。
The success of deep learning notoriously requires larger amounts of costly annotated data. This has led to the development of self-supervised learning (SSL) that aims to alleviate this limitation by creating domain specific pretext tasks on unlabeled data. Simultaneously, there are increasing interests in generalizing deep learning to the graph domain in the form of graph neural networks (GNNs). GNNs can naturally utilize unlabeled nodes through the simple neighborhood aggregation that is unable to thoroughly make use of unlabeled nodes. Thus, we seek to harness SSL for GNNs to fully exploit the unlabeled data. Different from data instances in the image and text domains, nodes in graphs present unique structure information and they are inherently linked indicating not independent and identically distributed (or i.i.d.). Such complexity is a double-edged sword for SSL on graphs. On the one hand, it determines that it is challenging to adopt solutions from the image and text domains to graphs and dedicated efforts are desired. On the other hand, it provides rich information that enables us to build SSL from a variety of perspectives. Thus, in this paper, we first deepen our understandings on when, why, and which strategies of SSL work with GNNs by empirically studying numerous basic SSL pretext tasks on graphs. Inspired by deep insights from the empirical studies, we propose a new direction SelfTask to build advanced pretext tasks that are able to achieve state-of-the-art performance on various real-world datasets. The specific experimental settings to reproduce our results can be found in \url{https://github.com/ChandlerBang/SelfTask-GNN}.