Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

被引:2
|
作者
Zhang, Lizhi [1 ]
Lu, Kai [1 ]
Lai, Zhiquan [1 ]
Fu, Yongquan [1 ]
Tang, Yu [1 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Training; Graphics processing units; Graph neural networks; Loading; Pipelines; Distributed databases; Social networking (online); pipeline parallel; data parallel; sampling; dataloading; cache;
D O I
10.1109/TC.2023.3305077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.
引用
下载
收藏
页码:3473 / 3488
页数:16
相关论文
共 50 条
  • [31] SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters
    Cao, Jing
    Zhu, Zongwei
    Zhou, Xuehai
    2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 94 - 102
  • [32] iMap: Incremental Node Mapping between Large Graphs Using GNN
    Xia, Yikuan
    Gao, Jun
    Cui, Bin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2191 - 2200
  • [33] SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs
    He, Yuqi
    Lai, Zhiquan
    Ran, Zhejiang
    Zhang, Lizhi
    Li, Dongsheng
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 106 - 113
  • [34] DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
    Wan, Xinchen
    Chen, Kai
    Zhang, Yiming
    2022 IEEE 30TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2022), 2022,
  • [35] AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs
    Zhou, Yangjie
    Song, Yaoxu
    Leng, Jingwen
    Liu, Zihan
    Cui, Weihao
    Zhang, Zhendong
    Guo, Cong
    Chen, Quan
    Li, Li
    Guo, Minyi
    PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 52 - 62
  • [36] Accelerating Distributed MoE Training and Inference with Lina
    Li, Jiamin
    Jiang, Yimin
    Zhu, Yibo
    Wang, Cong
    Xu, Hong
    PROCEEDINGS OF THE 2023 USENIX ANNUAL TECHNICAL CONFERENCE, 2023, : 945 - 959
  • [37] Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration
    Luo, Ziyue
    Bao, Yixin
    Wu, Chuan
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 890 - 899
  • [38] TGL: A General Framework for Temporal GNN Training on Billion-Scale Graphs
    Zhou, Hongkuan
    Zheng, Da
    Nisa, Israt
    Ioannidis, Vasileios
    Song, Xiang
    Karypis, George
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (08): : 1572 - 1580
  • [39] Accelerating Large Scale Real-Time GNN Inference using Channel Pruning
    Zhou, Hongkuan
    Srivastava, Ajitesh
    Zeng, Hanging
    Kannan, Rajgopal
    Prasanna, Viktor
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (09): : 1597 - 1605
  • [40] ACCELERATING FRAMEWORK FOR SIMULTANEOUS OPTIMIZATION OF MODEL ARCHITECTURES AND TRAINING HYPERPARAMETERS
    Choi, Youngjoon
    Choi, Jongwon
    Moon, Hankyu
    Lee, Jaeyoung
    Chang, Jin-Yeop
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3831 - 3835