Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

被引：2

作者：

Zhang, Lizhi ^{[1
]}

Lu, Kai ^{[1
]}

Lai, Zhiquan ^{[1
]}

Fu, Yongquan ^{[1
]}

Tang, Yu ^{[1
]}

Li, Dongsheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 12期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Training; Graphics processing units; Graph neural networks; Loading; Pipelines; Distributed databases; Social networking (online); pipeline parallel; data parallel; sampling; dataloading; cache;

D O I：

10.1109/TC.2023.3305077

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.

引用

页码：3473 / 3488

页数：16

共 50 条

[1] Accelerating Distributed GNN Training by Codes
Wang, Yanhong
Guan, Tianchan
Niu, Dimin
Zou, Qiaosha
Zheng, Hongzhong
Shi, C. -J. Richard
Xie, Yuan
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2598 - 2614
[2] 2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters
Zhang, Lizhi
Lai, Zhiquan
Li, Shengwei
Tang, Yu
Liu, Feng
Li, Dongsheng
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 103 - 113
[3] HGL: Accelerating Heterogeneous GNN Training with Holistic Representation and Optimization
Gui, Yuntao
Wu, Yidi
Yang, Han
Jin, Tatiana
Li, Boyang
Zhou, Qihui
Cheng, James
Yu, Fan
[J]. SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[4] PCGraph: Accelerating GNN Inference on Large Graphs via Partition Caching
Zhang, Lizhi
Lai, Zhiquan
Tang, Yu
Li, Dongsheng
Liu, Feng
Luo, Xiaochun
[J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 279 - 287
[5] Accelerating GNN Training on CPU plus Multi-FPGA Heterogeneous Platform
Lin, Yi-Chien
Zhang, Bingyi
Prasanna, Viktor
[J]. HIGH PERFORMANCE COMPUTING, CARLA 2022, 2022, 1660 : 16 - 30
[6] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
Cascon, Pablo
Ortiz, Andres
Ortega, Julio
Diaz, Antonio F.
Rojas, Ignacio
[J]. JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 302 - 313
[7] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
Pablo Cascón
Andrés Ortiz
Julio Ortega
Antonio F. Díaz
Ignacio Rojas
[J]. The Journal of Supercomputing, 2011, 58 : 302 - 313
[8] Auto-Divide GNN: Accelerating GNN Training with Subgraph Division
Chen, Hongyu
Ran, Zhejiang
Ge, Keshi
Lai, Zhiquan
Jiang, Jingfei
Li, Dongsheng
[J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 367 - 382
[9] Accelerating Large-scale Image Retrieval on Heterogeneous Architectures with Spark
Wang, Hanli
Xiao, Bo
Wang, Lei
Wu, Jun
[J]. MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1023 - 1026
[10] Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration in Heterogeneous Systems
Luo, Ziyue
Bao, Yixin
Wu, Chuan
[J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, : 1 - 15

← 1 2 3 4 5 →