BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

被引:0
|
作者
Liu, Tianfeng [1 ,3 ,4 ]
Chen, Yangrui [2 ,3 ]
Li, Dan [1 ,4 ]
Wu, Chuan [2 ]
Zhu, Yibo [3 ]
He, Jun [3 ]
Peng, Yanghua [3 ]
Chen, Hongzheng [3 ,5 ]
Chen, Hongzhi [3 ]
Guo, Chuanxiong [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] ByteDance, Beijing, Peoples R China
[4] Zhongguancun Lab, Beijing, Peoples R China
[5] Cornell Univ, Ithaca, NY USA
基金
中国国家自然科学基金;
关键词
SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.
引用
收藏
页码:103 / 118
页数:16
相关论文
共 50 条
  • [1] Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
    Cao, Jiamin
    Guan, Yu
    Qian, Kun
    Gao, Jiaqi
    Xiao, Wencong
    Dong, Jianbo
    Fu, Binzhang
    Cai, Dennis
    Zhai, Ennan
    PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 1 - 15
  • [2] DGS: Communication-Efficient Graph Sampling for Distributed GNN Training
    Wan, Xinchen
    Chen, Kai
    Zhang, Yiming
    2022 IEEE 30TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP 2022), 2022,
  • [3] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
    Ge, Keshi
    Ran, Zhejiang
    Lai, Zhiquan
    Zhang, Lizhi
    Li, Dongsheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (15):
  • [4] BRGraph: An efficient graph neural network training system by reusing batch data on GPU
    Ge, Keshi
    Ran, Zhejiang
    Lai, Zhiquan
    Zhang, Lizhi
    Li, Dongsheng
    Concurrency and Computation: Practice and Experience, 2022, 34 (15)
  • [5] I/O-efficient multilevel graph partitioning algorithm for massive graph data
    Her, JH
    Ramakrishna, RS
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (07) : 1789 - 1794
  • [6] Efficient I/O for Neural Network Training with Compressed Data
    Zhang, Zhao
    Huang, Lei
    Pauloski, J. Gregory
    Foster, Ian T.
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 409 - 418
  • [7] SpeedyLoader: Efficient Pipelining of Data Preprocessing and Machine Learning Training
    Nouaji, Rahma
    Bitchebe, Stella
    Balmau, Oana
    PROCEEDINGS OF THE 2024 4TH WORKSHOP ON MACHINE LEARNING AND SYSTEMS, EUROMLSYS 2024, 2024, : 65 - 72
  • [8] A Highly Efficient Implementation of I/O Functions on GPU
    Wu, Wei
    Qi, FengBin
    He, WangQuan
    Wang, ShanShan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2378 - 2383
  • [9] Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs
    Bai, Youhui
    Li, Cheng
    Lin, Zhiqi
    Wu, Yufei
    Miao, Youshan
    Liu, Yunxin
    Xu, Yinlong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (10) : 2541 - 2556
  • [10] I/O efficient ECC graph decomposition via graph reduction
    Yuan, Long
    Qin, Lu
    Lin, Xuemin
    Chang, Lijun
    Zhang, Wenjie
    VLDB JOURNAL, 2017, 26 (02): : 275 - 300