Accelerating GNN Training by Adapting Large Graphs to Distributed Heterogeneous Architectures

被引：2

作者：

Zhang, Lizhi ^{[1
]}

Lu, Kai ^{[1
]}

Lai, Zhiquan ^{[1
]}

Fu, Yongquan ^{[1
]}

Tang, Yu ^{[1
]}

Li, Dongsheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp Sci & Technol, Changsha 410073, Hunan, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 12期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Training; Graphics processing units; Graph neural networks; Loading; Pipelines; Distributed databases; Social networking (online); pipeline parallel; data parallel; sampling; dataloading; cache;

D O I：

10.1109/TC.2023.3305077

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph neural networks (GNNs) have been successfully applied to many important application domains on graph data. As graphs become increasingly large, existing GNN training frameworks typically use mini-batch sampling during feature aggregation to lower resource burdens, which unfortunately suffer from long memory accessing latency and inefficient data transfer of vertex features from CPU to GPU. This paper proposes 2PGraph, a system that addresses these limitations of mini-batch sampling and feature aggregation and supports fast and efficient single-GPU and distributed GNN training. First, 2PGraph presents a locality awareness GNN-training scheduling method that schedules the vertices based on the locality of the graph topology, significantly accelerating the sampling and aggregation, improving the data locality of vertex access, and limiting the range of neighborhood expansion. Second, 2PGraph proposes a GNN-layer-aware feature caching method on available GPU resources with a hit rate up to 100%, which avoids redundant data transfer between CPU and GPU. Third, 2PGraph presents a self-dependence cluster-based graph partition method, achieving high sampling and cache efficiency for distributed environments. Experimental results on real-world graph datasets show that 2PGraph reduces memory access latency by up to 90% mini-batch sampling, and data transfer time by up to 99%. For distributed GNN training over an 8-GPU cluster, 2PGraph achieves up to 8.7x performance speedup over state-of-the-art approaches.

引用

下载

页码：3473 / 3488

页数：16

共 50 条

[21] Entropy Aware Training for Fast and Accurate Distributed GNN
Deshmukh, Dimly
Gupta, Gagan Raj
Chawla, Manisha
Jatala, Vishwcsh
Haldar, Anirban
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 986 - 991
[22] DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning
Li, Zhiyuan
Jian, Xun
Wang, Yue
Shao, Yingxia
Chen, Lei
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (06): : 1364 - 1376
[23] FlexGraph: A Flexible and Efficient Distributed Framework for GNN Training
Wang, Lei
Yin, Qiang
Tian, Chao
Yang, Jianbang
Chen, Rong
Yu, Wenyuan
Yao, Zihang
Zhou, Jingren
PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 67 - 82
[24] DGCL: An Efficient Communication Library for Distributed GNN Training
Cai, Zhenkun
Yan, Xiao
Wu, Yidi
Ma, Kaihao
Cheng, James
Yu, Fan
PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 130 - 144
[25] NeutronStar: Distributed GNN Training with Hybrid Dependency Management
Wang, Qiange
Zhang, Yanfeng
Wang, Hao
Chen, Chaoyi
Zhang, Xiaodong
Yu, Ge
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1301 - 1315
[26] XAgg: Accelerating Heterogeneous Distributed Training Through XDP-Based Gradient Aggregation
Zhang, Qianyu
Zhao, Gongming
Xu, Hongli
Yang, Peng
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (03) : 2174 - 2188
[27] Scheduling Methods for Accelerating Applications on Architectures with Heterogeneous Cores
Chen, Linchuan
Huo, Xin
Agrawal, Gagan
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 48 - 57
[28] Accelerating Large-Scale Distributed Neural Network Training with SPMD Parallelism
Zhang, Shiwei
Diao, Lansong
Wu, Chuan
Wang, Siyu
Lin, Wei
PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 403 - 418
[29] HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture
Lin, Yi-Chien
Prasanna, Viktor
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 557 - 567
[30] HPH: Hybrid Parallelism on Heterogeneous Clusters for Accelerating Large-scale DNNs Training
Duan, Yabo
Lai, Zhiquan
Li, Shengwei
Liu, Weijie
Ge, Keshi
Liang, Peng
Li, Dongsheng
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 313 - 323

← 1 2 3 4 5 →