BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing

被引：0

作者：

Liu, Tianfeng ^{[1
,3
,4
]}

Chen, Yangrui ^{[2
,3
]}

Li, Dan ^{[1
,4
]}

Wu, Chuan ^{[2
]}

Zhu, Yibo ^{[3
]}

He, Jun ^{[3
]}

Peng, Yanghua ^{[3
]}

Chen, Hongzheng ^{[3
,5
]}

Chen, Hongzhi ^{[3
]}

Guo, Chuanxiong ^{[3
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Univ Hong Kong, Hong Kong, Peoples R China

[3] ByteDance, Beijing, Peoples R China

[4] Zhongguancun Lab, Beijing, Peoples R China

[5] Cornell Univ, Ithaca, NY USA

来源：

PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

SYSTEM;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction. Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. The main bottlenecks are the process of preparing data for GPUs - subgraph sampling and feature retrieving. This paper proposes BGL, a distributed GNN training system designed to address the bottlenecks with a few key ideas. First, we propose a dynamic cache engine to minimize feature retrieving traffic. By co-designing caching policy and the order of sampling, we find a sweet spot of low overhead and a high cache hit ratio. Second, we improve the graph partition algorithm to reduce cross-partition communication during subgraph sampling. Finally, careful resource isolation reduces contention between different data preprocessing stages. Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 1.9x on average.

引用

页码：103 / 118

页数：16

共 50 条

[41] MDTM: Optimizing Data Transfer using Multicore-Aware I/O Scheduling
Zhang, Liang
Demar, Phil
Wu, Wenji
Kim, Bockjoo
2017 IEEE 42ND CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN), 2017, : 104 - 111
[42] I/O Efficient Dynamic Data Structures for Longest Prefix Queries
Hershcovitch, Moshe
Kaplan, Haim
ALGORITHMICA, 2013, 65 (02) : 371 - 390
[43] I/O Efficient Dynamic Data Structures for Longest Prefix Queries
Moshe Hershcovitch
Haim Kaplan
Algorithmica, 2013, 65 : 371 - 390
[44] Implementing I/O-efficient data structures using TPIE
Arge, L
Procopiuc, O
Vitter, JS
ALGORITHMS-ESA 2002, PROCEEDINGS, 2002, 2461 : 88 - 100
[45] I/O efficient dynamic data structures for longest prefix queries
Hershcovitch, Moshe
Kaplan, Haim
ALGORITHM THEORY - SWAT 2008, 2008, 5124 : 29 - +
[46] HGraph: I/O-Efficient Distributed and Iterative Graph Computing by Hybrid Pushing/Pulling
Wang, Zhigang
Gu, Yu
Bao, Yubin
Yu, Ge
Yu, Jeffrey Xu
Wei, Zhiqiang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (05) : 1973 - 1987
[47] An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
Zhigang Wang
Yu Gu
Yubin Bao
Ge Yu
Lixin Gao
Distributed and Parallel Databases, 2017, 35 : 177 - 196
[48] An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
Wang, Zhigang
Gu, Yu
Bao, Yubin
Yu, Ge
Gao, Lixin
DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 177 - 196
[49] A Hybrid Update Strategy for I/O-Efficient Out-of-Core Graph Processing
Xu, Xianghao
Wang, Fang
Jiang, Hong
Chen, Yongli
Feng, Dan
Zhang, Yongxuan
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1767 - 1782
[50] I/O-efficient data structures for non-overlapping indexing
Hooshmand, Sahar
Abedin, Paniz
Kulekci, M. Oguzhan
Thankachan, Sharma V.
THEORETICAL COMPUTER SCIENCE, 2021, 857 : 1 - 7

← 1 2 3 4 5 →