fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU

被引:0
|
作者
Chen, Zhaodong [1 ]
Yan, Mingyu [1 ]
Zhu, Maohua [1 ]
Deng, Lei [1 ]
Li, Guoqi [2 ]
Li, Shuangchen [3 ]
Xie, Yuan [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Alibaba Grp, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
10.1145/3400302.3415610
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph convolutional neural networks (GNN) have achieved state-of-the-art performance on tasks like node classification. It has become a new workload family member in data-centers. GNN works on irregular graph-structured data with three distinct phases: Combination, Graph Processing, and Aggregation. While Combination phase has been well supported by sgemm kernels in cuBLAS, the other two phases are still inefficient on GPGPU due to the lack of optimized CUDA kernels. In particular, Aggregation phase introduces large volume of DRAM storage footprint and data movement, and both Aggregation and Graph Processing phases suffer from high kernel launching time. These inefficiencies not only decrease training throughput but also limit users from training GNNs on larger graphs on GPGPU. Although these problems have been partially alleviated by recent studies, their optimizations are still not sufficient. In this paper, we propose fuseGNN, an extension of PyTorch that provides highly optimized APIs and CUDA kernels for GNN. First, two different programming abstractions for Aggregation phase are utilized to handle graphs with different average degrees. Second, dedicated GPGPU kernels are developed for Aggregation and Graph Processing in both forward and backward passes, in which kernel-fusion along with other optimization strategies are applied to reduce kernel launching time and latency as well as exploit data reuse opportunities. Evaluation on multiple benchmarks shows that fuseGNN achieves up to 5.3x end-to-end speedup over state-of-the-art frameworks, and the DRAM storage footprint is reduced by several orders of magnitude on large datasets.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [21] A deep graph convolutional neural network architecture for graph classification
    Zhou, Yuchen
    Huo, Hongtao
    Hou, Zhiwen
    Bu, Fanliang
    PLOS BIOLOGY, 2023, 21 (03)
  • [22] WETLAND MAPPING BY JOINTLY USE OF CONVOLUTIONAL NEURAL NETWORK AND GRAPH CONVOLUTIONAL NETWORK
    Jafarzadeh, Hamid
    Mahdianpari, Masoud
    Gill, Eric
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2219 - 2222
  • [23] Accelerating Neural Network Training: A Brief Review
    Nokhwal, Sahil
    Chilakalapudi, Priyanka
    Donekal, Preeti
    Nokhwal, Suman
    Pahune, Saurabh
    Chaudhary, Ankit
    2024 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, ISMSI 2024, 2024, : 31 - 35
  • [24] A Convolutional Neural Network and Graph Convolutional Network Based Framework for AD Classification
    Lin, Lan
    Xiong, Min
    Zhang, Ge
    Kang, Wenjie
    Sun, Shen
    Wu, Shuicai
    SENSORS, 2023, 23 (04)
  • [25] Accelerating network layouts using graph neural networks
    Both, Csaba
    Dehmamy, Nima
    Yu, Rose
    Barabasi, Albert-Laszlo
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [26] Accelerating Virtual Network Embedding with Graph Neural Networks
    Habibi, Farzad
    Dolati, Mahdi
    Khonsari, Ahmad
    Ghaderi, Majid
    2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [27] Accelerating network layouts using graph neural networks
    Csaba Both
    Nima Dehmamy
    Rose Yu
    Albert-László Barabási
    Nature Communications, 14
  • [28] DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
    Chen, Zhengbo
    Yu, Qi
    Zheng, Fang
    Guo, Feng
    Chen, Zuoning
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [29] PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network
    Tian, Chao
    Ma, Lingxiao
    Yang, Zhi
    Dai, Yafei
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 936 - 945
  • [30] Accelerating Deep Convolutional Neural Network base on stochastic computing
    Sadi, Mohamad Hasani
    Mahani, Ali
    INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 113 - 121