BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

被引:0
|
作者
Liu, Tianfeng [1 ,3 ,4 ]
Chen, Yangrui [2 ,3 ]
Li, Dan [1 ,4 ]
Wu, Chuan [2 ]
Zhu, Yibo [3 ]
He, Jun [3 ]
Peng, Yanghua [3 ]
Chen, Hongzheng [3 ,5 ]
Chen, Hongzhi [3 ]
Guo, Chuanxiong [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] ByteDance, Beijing, Peoples R China
[4] Zhongguancun Lab, Beijing, Peoples R China
[5] Cornell Univ, Ithaca, NY USA
基金
中国国家自然科学基金;
关键词
SYSTEM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.
引用
收藏
页码:103 / 118
页数:16
相关论文
共 50 条
  • [41] MDTM: Optimizing Data Transfer using Multicore-Aware I/O Scheduling
    Zhang, Liang
    Demar, Phil
    Wu, Wenji
    Kim, Bockjoo
    2017 IEEE 42ND CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN), 2017, : 104 - 111
  • [42] I/O Efficient Dynamic Data Structures for Longest Prefix Queries
    Hershcovitch, Moshe
    Kaplan, Haim
    ALGORITHMICA, 2013, 65 (02) : 371 - 390
  • [43] I/O Efficient Dynamic Data Structures for Longest Prefix Queries
    Moshe Hershcovitch
    Haim Kaplan
    Algorithmica, 2013, 65 : 371 - 390
  • [44] Implementing I/O-efficient data structures using TPIE
    Arge, L
    Procopiuc, O
    Vitter, JS
    ALGORITHMS-ESA 2002, PROCEEDINGS, 2002, 2461 : 88 - 100
  • [45] I/O efficient dynamic data structures for longest prefix queries
    Hershcovitch, Moshe
    Kaplan, Haim
    ALGORITHM THEORY - SWAT 2008, 2008, 5124 : 29 - +
  • [46] HGraph: I/O-Efficient Distributed and Iterative Graph Computing by Hybrid Pushing/Pulling
    Wang, Zhigang
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    Yu, Jeffrey Xu
    Wei, Zhiqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) : 1973 - 1987
  • [47] An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
    Zhigang Wang
    Yu Gu
    Yubin Bao
    Ge Yu
    Lixin Gao
    Distributed and Parallel Databases, 2017, 35 : 177 - 196
  • [48] An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
    Wang, Zhigang
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    Gao, Lixin
    DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 177 - 196
  • [49] A Hybrid Update Strategy for I/O-Efficient Out-of-Core Graph Processing
    Xu, Xianghao
    Wang, Fang
    Jiang, Hong
    Chen, Yongli
    Feng, Dan
    Zhang, Yongxuan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1767 - 1782
  • [50] I/O-efficient data structures for non-overlapping indexing
    Hooshmand, Sahar
    Abedin, Paniz
    Kulekci, M. Oguzhan
    Thankachan, Sharma V.
    THEORETICAL COMPUTER SCIENCE, 2021, 857 : 1 - 7