BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

被引:0
|
作者
Liu, Tianfeng [1 ,3 ,4 ]
Chen, Yangrui [2 ,3 ]
Li, Dan [1 ,4 ]
Wu, Chuan [2 ]
Zhu, Yibo [3 ]
He, Jun [3 ]
Peng, Yanghua [3 ]
Chen, Hongzheng [3 ,5 ]
Chen, Hongzhi [3 ]
Guo, Chuanxiong [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] ByteDance, Beijing, Peoples R China
[4] Zhongguancun Lab, Beijing, Peoples R China
[5] Cornell Univ, Ithaca, NY USA
基金
中国国家自然科学基金;
关键词
SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.
引用
收藏
页码:103 / 118
页数:16
相关论文
共 50 条
  • [31] Efficient I/O and Storage of Adaptive-Resolution Data
    Kumar, Sidharth
    Edwards, John
    Bremer, Peer-Timo
    Knoll, Aaron
    Christensen, Cameron
    Vishwanath, Venkatram
    Carns, Philip
    Schmidt, John A.
    Pascucci, Valerio
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 413 - 423
  • [32] Efficient Data Restructuring and Aggregation for I/O Acceleration in PIDX
    Kumar, Sidharth
    Vishwanath, Venkatram
    Carns, Philip
    Levine, Joshua A.
    Latham, Robert
    Scorzelli, Giorgio
    Kolla, Hemanth
    Grout, Ray
    Ross, Robert
    Papka, Michael E.
    Chen, Jacqueline
    Pascucci, Valerio
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [33] Efficient parallel I/O scheduling in the presence of data duplication
    Liu, PF
    Wang, DW
    Wu, JJ
    2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2003, : 231 - 238
  • [34] GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine
    Sun, Peng
    Wen, Yonggang
    Duong, Ta Nguyen Binh
    Xiao, Xiaokui
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (04) : 816 - 829
  • [35] Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing
    Wang, Zhigang
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    Yu, Jeffrey Xu
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 479 - 494
  • [36] BG3: A Cost Effective and I/O Efficient Graph Database in ByteDance
    Zhang, Wei
    Chen, Cheng
    Wang, Qiange
    Wang, Wei
    Yang, Shijiao
    Zhou, Bingyu
    Zhu, Huiming
    Chen, Chao
    Zhao, Yongjun
    Hu, Yingqian
    Cheng, Miaomiao
    Li, Meng
    Tan, Hongfei
    Liu, Mengjin
    Lin, Hexiang
    Zhang, Shuai
    Zhang, Lei
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 360 - 372
  • [37] Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings
    Moon, Gordon E.
    Newman-Griffis, Denis
    Kim, Jinsung
    Sukumaran-Rajam, Aravind
    Fosler-Lussier, Eric
    Sadayappan, P.
    PROCEEDINGS OF 2019 5TH IEEE/ACM WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2019), 2019, : 44 - 55
  • [38] HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy
    Xu, Xianghao
    Wang, Fang
    Jiang, Hong
    Cheng, Yongli
    Feng, Dan
    Zhang, Yongxuan
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [39] Optimizing Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive Side Scaling
    Dong, Yaozu
    Xu, Dongxiao
    Zhang, Yang
    Liao, Guangdeng
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 26 - 34
  • [40] Optimizing the Query Performance of Block Index Through Data Analysis and I/O Modeling
    Wu, Tzuhsien
    Chou, Jerry
    Hao, Shyng
    Dong, Bin
    Klasky, Scott
    Wu, Kesheng
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,