Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

被引:2
|
作者
Zhang, Lizhi [1 ]
Lu, Kai [1 ]
Lai, Zhiquan [1 ]
Fu, Yongquan [1 ]
Tang, Yu [1 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Training; Graphics processing units; Graph neural networks; Loading; Pipelines; Distributed databases; Social networking (online); pipeline parallel; data parallel; sampling; dataloading; cache;
D O I
10.1109/TC.2023.3305077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.
引用
收藏
页码:3473 / 3488
页数:16
相关论文
共 50 条
  • [1] Accelerating Distributed GNN Training by Codes
    Wang, Yanhong
    Guan, Tianchan
    Niu, Dimin
    Zou, Qiaosha
    Zheng, Hongzhong
    Shi, C. -J. Richard
    Xie, Yuan
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (09) : 2598 - 2614
  • [2] 2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters
    Zhang, Lizhi
    Lai, Zhiquan
    Li, Shengwei
    Tang, Yu
    Liu, Feng
    Li, Dongsheng
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 103 - 113
  • [3] HGL: Accelerating Heterogeneous GNN Training with Holistic Representation and Optimization
    Gui, Yuntao
    Wu, Yidi
    Yang, Han
    Jin, Tatiana
    Li, Boyang
    Zhou, Qihui
    Cheng, James
    Yu, Fan
    [J]. SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [4] PCGraph: Accelerating GNN Inference on Large Graphs via Partition Caching
    Zhang, Lizhi
    Lai, Zhiquan
    Tang, Yu
    Li, Dongsheng
    Liu, Feng
    Luo, Xiaochun
    [J]. 19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 279 - 287
  • [5] Accelerating GNN Training on CPU plus Multi-FPGA Heterogeneous Platform
    Lin, Yi-Chien
    Zhang, Bingyi
    Prasanna, Viktor
    [J]. HIGH PERFORMANCE COMPUTING, CARLA 2022, 2022, 1660 : 16 - 30
  • [6] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
    Cascon, Pablo
    Ortiz, Andres
    Ortega, Julio
    Diaz, Antonio F.
    Rojas, Ignacio
    [J]. JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 302 - 313
  • [7] Accelerating network applications by distributed interfaces on heterogeneous multiprocessor architectures
    Pablo Cascón
    Andrés Ortiz
    Julio Ortega
    Antonio F. Díaz
    Ignacio Rojas
    [J]. The Journal of Supercomputing, 2011, 58 : 302 - 313
  • [8] Auto-Divide GNN: Accelerating GNN Training with Subgraph Division
    Chen, Hongyu
    Ran, Zhejiang
    Ge, Keshi
    Lai, Zhiquan
    Jiang, Jingfei
    Li, Dongsheng
    [J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 367 - 382
  • [9] Accelerating Large-scale Image Retrieval on Heterogeneous Architectures with Spark
    Wang, Hanli
    Xiao, Bo
    Wang, Lei
    Wu, Jun
    [J]. MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1023 - 1026
  • [10] Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration in Heterogeneous Systems
    Luo, Ziyue
    Bao, Yixin
    Wu, Chuan
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, : 1 - 15