Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

被引:2
|
作者
Zhang, Lizhi [1 ]
Lu, Kai [1 ]
Lai, Zhiquan [1 ]
Fu, Yongquan [1 ]
Tang, Yu [1 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Training; Graphics processing units; Graph neural networks; Loading; Pipelines; Distributed databases; Social networking (online); pipeline parallel; data parallel; sampling; dataloading; cache;
D O I
10.1109/TC.2023.3305077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.
引用
下载
收藏
页码:3473 / 3488
页数:16
相关论文
共 50 条
  • [21] Entropy Aware Training for Fast and Accurate Distributed GNN
    Deshmukh, Dimly
    Gupta, Gagan Raj
    Chawla, Manisha
    Jatala, Vishwcsh
    Haldar, Anirban
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 986 - 991
  • [22] DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning
    Li, Zhiyuan
    Jian, Xun
    Wang, Yue
    Shao, Yingxia
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (06): : 1364 - 1376
  • [23] FlexGraph: A Flexible and Efficient Distributed Framework for GNN Training
    Wang, Lei
    Yin, Qiang
    Tian, Chao
    Yang, Jianbang
    Chen, Rong
    Yu, Wenyuan
    Yao, Zihang
    Zhou, Jingren
    PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 67 - 82
  • [24] DGCL: An Efficient Communication Library for Distributed GNN Training
    Cai, Zhenkun
    Yan, Xiao
    Wu, Yidi
    Ma, Kaihao
    Cheng, James
    Yu, Fan
    PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 130 - 144
  • [25] NeutronStar: Distributed GNN Training with Hybrid Dependency Management
    Wang, Qiange
    Zhang, Yanfeng
    Wang, Hao
    Chen, Chaoyi
    Zhang, Xiaodong
    Yu, Ge
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1301 - 1315
  • [26] XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation
    Zhang, Qianyu
    Zhao, Gongming
    Xu, Hongli
    Yang, Peng
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (03) : 2174 - 2188
  • [27] Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores
    Chen, Linchuan
    Huo, Xin
    Agrawal, Gagan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 48 - 57
  • [28] Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism
    Zhang, Shiwei
    Diao, Lansong
    Wu, Chuan
    Wang, Siyu
    Lin, Wei
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 403 - 418
  • [29] HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture
    Lin, Yi-Chien
    Prasanna, Viktor
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 557 - 567
  • [30] HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training
    Duan, Yabo
    Lai, Zhiquan
    Li, Shengwei
    Liu, Weijie
    Ge, Keshi
    Liang, Peng
    Li, Dongsheng
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 313 - 323