BNS-GCN：具有分区并行性和随机边界节点采样的图形卷积网络的有效全圈培训

论文标题

BNS-GCN：具有分区并行性和随机边界节点采样的图形卷积网络的有效全圈培训

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling

论文作者

Wan, Cheng, Li, Youjie, Li, Ang, Kim, Nam Sung, Lin, Yingyan

论文摘要

图形卷积网络（GCN）已成为基于图的学习任务的最新方法。但是，大规模的培训GCN仍然具有挑战性，这既阻碍了对更复杂的GCN体系结构的探索及其在现实世界大图中的应用。虽然考虑图形分区和分发培训以应对这一挑战可能很自然，但由于现有设计的局限性，在先前的工作中，该方向只是在以前的作品中略微刮擦。在这项工作中，我们首先分析了为什么分布式GCN培训无效，并确定基本原因是每个分区子图的边界节点的过多数量，这很容易爆炸，从而爆炸了GCN培训的内存和通信成本。此外，我们提出了一种称为BNS-GCN的简单而有效的方法，该方法采用随机的边界节点采样，以实现有效且可扩展的分布式GCN训练。实验和消融研究一致地验证了BNS-GCN的有效性，例如，将吞吐量提高了16.2倍，并将记忆使用量最多减少了58％，同时保持了全部刻画的准确性。此外，理论和经验分析都表明，BNS-GCN比现有基于抽样的方法的收敛性更好。我们认为，我们的BNS-GCN已经开辟了一个新的范式，以便对GCN进行大规模培训。该代码可在https://github.com/rice-eic/bns-gcn上找到。

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art method for graph-based learning tasks. However, training GCNs at scale is still challenging, hindering both the exploration of more sophisticated GCN architectures and their applications to real-world large graphs. While it might be natural to consider graph partition and distributed training for tackling this challenge, this direction has only been slightly scratched the surface in the previous works due to the limitations of existing designs. In this work, we first analyze why distributed GCN training is ineffective and identify the underlying cause to be the excessive number of boundary nodes of each partitioned subgraph, which easily explodes the memory and communication costs for GCN training. Furthermore, we propose a simple yet effective method dubbed BNS-GCN that adopts random Boundary-Node-Sampling to enable efficient and scalable distributed GCN training. Experiments and ablation studies consistently validate the effectiveness of BNS-GCN, e.g., boosting the throughput by up to 16.2x and reducing the memory usage by up to 58%, while maintaining a full-graph accuracy. Furthermore, both theoretical and empirical analysis show that BNS-GCN enjoys a better convergence than existing sampling-based methods. We believe that our BNS-GCN has opened up a new paradigm for enabling GCN training at scale. The code is available at https://github.com/RICE-EIC/BNS-GCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题