BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

被引：0

作者：

Liu, Tianfeng ^{[1
,3
,4
]}

Chen, Yangrui ^{[2
,3
]}

Li, Dan ^{[1
,4
]}

Wu, Chuan ^{[2
]}

Zhu, Yibo ^{[3
]}

He, Jun ^{[3
]}

Peng, Yanghua ^{[3
]}

Chen, Hongzheng ^{[3
,5
]}

Chen, Hongzhi ^{[3
]}

Guo, Chuanxiong ^{[3
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Hong Kong, Hong Kong, Peoples R China

[3] ByteDance, Beijing, Peoples R China

[4] Zhongguancun Lab, Beijing, Peoples R China

[5] Cornell Univ, Ithaca, NY USA

来源：

PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

SYSTEM;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.

引用

页码：103 / 118

页数：16

共 50 条

[1] Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
Cao, Jiamin
Guan, Yu
Qian, Kun
Gao, Jiaqi
Xiao, Wencong
Dong, Jianbo
Fu, Binzhang
Cai, Dennis
Zhai, Ennan
PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 1 - 15
[2] DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
Wan, Xinchen
Chen, Kai
Zhang, Yiming
2022 IEEE 30TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2022), 2022,
[3] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
Ge, Keshi
Ran, Zhejiang
Lai, Zhiquan
Zhang, Lizhi
Li, Dongsheng
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15):
[4] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
Ge, Keshi
Ran, Zhejiang
Lai, Zhiquan
Zhang, Lizhi
Li, Dongsheng
Concurrency and Computation: Practice and Experience, 2022, 34 (15)
[5] I/O-efficient multilevel graph partitioning algorithm for massive graph data
Her, JH
Ramakrishna, RS
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (07) : 1789 - 1794
[6] Efficient I/O for Neural Network Training with Compressed Data
Zhang, Zhao
Huang, Lei
Pauloski, J. Gregory
Foster, Ian T.
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 409 - 418
[7] SpeedyLoader: Efficient Pipelining of Data Preprocessing and Machine Learning Training
Nouaji, Rahma
Bitchebe, Stella
Balmau, Oana
PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 65 - 72
[8] A Highly Efficient Implementation of I/O Functions on GPU
Wu, Wei
Qi, FengBin
He, WangQuan
Wang, ShanShan
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2378 - 2383
[9] Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs
Bai, Youhui
Li, Cheng
Lin, Zhiqi
Wu, Yufei
Miao, Youshan
Liu, Yunxin
Xu, Yinlong
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (10) : 2541 - 2556
[10] I/O efficient ECC graph decomposition via graph reduction
Yuan, Long
Qin, Lu
Lin, Xuemin
Chang, Lijun
Zhang, Wenjie
VLDB JOURNAL, 2017, 26 (02): : 275 - 300

← 1 2 3 4 5 →