fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU

被引：0

作者：

Chen, Zhaodong ^{[1
]}

Yan, Mingyu ^{[1
]}

Zhu, Maohua ^{[1
]}

Deng, Lei ^{[1
]}

Li, Guoqi ^{[2
]}

Li, Shuangchen ^{[3
]}

Xie, Yuan ^{[1
]}

机构：

[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

[2] Tsinghua Univ, Beijing, Peoples R China

[3] Alibaba Grp, Hangzhou, Peoples R China

来源：

2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD) | 2020年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1145/3400302.3415610

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph convolutional neural networks (GNN) have achieved state-of-the-art performance on tasks like node classification. It has become a new workload family member in data-centers. GNN works on irregular graph-structured data with three distinct phases: Combination, Graph Processing, and Aggregation. While Combination phase has been well supported by sgemm kernels in cuBLAS, the other two phases are still inefficient on GPGPU due to the lack of optimized CUDA kernels. In particular, Aggregation phase introduces large volume of DRAM storage footprint and data movement, and both Aggregation and Graph Processing phases suffer from high kernel launching time. These inefficiencies not only decrease training throughput but also limit users from training GNNs on larger graphs on GPGPU. Although these problems have been partially alleviated by recent studies, their optimizations are still not sufficient. In this paper, we propose fuseGNN, an extension of PyTorch that provides highly optimized APIs and CUDA kernels for GNN. First, two different programming abstractions for Aggregation phase are utilized to handle graphs with different average degrees. Second, dedicated GPGPU kernels are developed for Aggregation and Graph Processing in both forward and backward passes, in which kernel-fusion along with other optimization strategies are applied to reduce kernel launching time and latency as well as exploit data reuse opportunities. Evaluation on multiple benchmarks shows that fuseGNN achieves up to 5.3x end-to-end speedup over state-of-the-art frameworks, and the DRAM storage footprint is reduced by several orders of magnitude on large datasets.

引用

下载

页数：9

共 50 条

[21] A deep graph convolutional neural network architecture for graph classification
Zhou, Yuchen
Huo, Hongtao
Hou, Zhiwen
Bu, Fanliang
PLOS BIOLOGY, 2023, 21 (03)
[22] WETLAND MAPPING BY JOINTLY USE OF CONVOLUTIONAL NEURAL NETWORK AND GRAPH CONVOLUTIONAL NETWORK
Jafarzadeh, Hamid
Mahdianpari, Masoud
Gill, Eric
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2219 - 2222
[23] Accelerating Neural Network Training: A Brief Review
Nokhwal, Sahil
Chilakalapudi, Priyanka
Donekal, Preeti
Nokhwal, Suman
Pahune, Saurabh
Chaudhary, Ankit
2024 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, ISMSI 2024, 2024, : 31 - 35
[24] A Convolutional Neural Network and Graph Convolutional Network Based Framework for AD Classification
Lin, Lan
Xiong, Min
Zhang, Ge
Kang, Wenjie
Sun, Shen
Wu, Shuicai
SENSORS, 2023, 23 (04)
[25] Accelerating network layouts using graph neural networks
Both, Csaba
Dehmamy, Nima
Yu, Rose
Barabasi, Albert-Laszlo
NATURE COMMUNICATIONS, 2023, 14 (01)
[26] Accelerating Virtual Network Embedding with Graph Neural Networks
Habibi, Farzad
Dolati, Mahdi
Khonsari, Ahmad
Ghaderi, Majid
2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
[27] Accelerating network layouts using graph neural networks
Csaba Both
Nima Dehmamy
Rose Yu
Albert-László Barabási
Nature Communications, 14
[28] DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
Chen, Zhengbo
Yu, Qi
Zheng, Fang
Guo, Feng
Chen, Zuoning
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[29] PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network
Tian, Chao
Ma, Lingxiao
Yang, Zhi
Dai, Yafei
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 936 - 945
[30] Accelerating Deep Convolutional Neural Network base on stochastic computing
Sadi, Mohamad Hasani
Mahani, Ali
INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 113 - 121

← 1 2 3 4 5 →