CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

被引:3
|
作者
Zhong, Kai [1 ]
Zeng, Shulin [1 ]
Hou, Wentao [1 ,2 ]
Dai, Guohao [3 ]
Zhu, Zhenhua [1 ]
Zhang, Xuecang [4 ]
Xiao, Shihai [4 ]
Yang, Huazhong [1 ]
Wang, Yu [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
[2] Univ Wisconsin, Comp Sci Dept, Madison, WI USA
[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai 200240, Peoples R China
[4] Huawei Technol Co Ltd, 2012 Lab, Shenzhen 518063, Peoples R China
基金
中国国家自然科学基金;
关键词
Neural networks; Graph neural networks; Computer architecture; System-on-chip; Quantization (signal); Inference algorithms; Hardware acceleration; Accelerator; graph neural network (GNN); quantization; sampling; PERFORMANCE;
D O I
10.1109/TCAD.2023.3279302
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As a new algorithm of graph embedding, graph neural networks (GNNs) have been widely used in many fields. However, GNN computing has the characteristics of both sparse graph processing and dense neural network, which make it difficult to be deployed efficiently on the existing graph processing accelerators or neural network accelerators. Recently, some GNN accelerators have been proposed, but the following challenges have not been fully solved: 1) the minibatch GNN inference scenario has the potential of software and hardware co-design, which can bring 30% computation amount reduction, and this is not well utilized. Besides, the cost of message flow graph construction is large and may account for more than 50% of the total delay; 2) the feature aggregation has a large amount of data access and relatively small amount of computation, which leads to low on-chip data reuse, only 10% of dense computing; and 3) without the optimization of sparse computing units, simple memory bank and cross bar architecture can easily lead to bank access conflict and load imbalance, reducing the utilization of computing units to less than 60%. In order to solve the above problems, we propose a algorithm-hardware co-design scheme to accelerate GNN inference, which includes three technologies: 1) a reuse-aware sampling method is proposed for minibatch inference scenarios, which reduces 30% of the calculation and improves the on-chip reusability of local data; 2) through the nodewise parallelism-aware quantization, the features and weights are quantized to integers with eight or four bits, which reduces the amount of memory access by at least four times; and 3) an accelerator supporting the above technologies is designed and evaluated, and different operations are supported by the sampling-inference integration architecture. The multibank on-chip memory pool is designed to support data reuse, and edge stream reordering is used to reduce data access conflicts, improving the utilization of computing units by 1.5x . Combined with the above technologies, the experiments show that our design achieves 9.2x speedup and 29x energy efficiency improvement compared with the Deep Graph Library framework running on servers equipped with CPU and GPU.
引用
收藏
页码:4883 / 4896
页数:14
相关论文
共 50 条
  • [1] Algorithm-hardware Co-design for Deformable Convolution
    Huang, Qijing
    Wang, Dequan
    Gao, Yizhao
    Cai, Yaohui
    Dong, Zhen
    Wu, Bichen
    Keutzer, Kurt
    Wawrzynek, John
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 48 - 51
  • [2] Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
    Zhang, Xinyi
    Wu, Yawen
    Zhou, Peipei
    Tang, Xulong
    Hu, Jingtong
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
  • [3] CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM
    Luo, Yixuan
    Behnam, Payman
    Thorat, Kiran
    Liu, Zhuo
    Peng, Hongwu
    Huang, Shaoyi
    Zhou, Shu
    Khan, Omer
    Tumanov, Alexey
    Ding, Caiwen
    Geng, Tong
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 280 - 289
  • [4] Toolflow for the algorithm-hardware co-design of memristive ANN accelerators
    Wabnitz, Malte
    Gemmeke, Tobias
    Memories - Materials, Devices, Circuits and Systems, 2023, 5
  • [5] Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference
    Tambe, Thierry
    Yang, En-Yu
    Wan, Zishen
    Deng, Yuntian
    Reddi, Vijay Janapa
    Rush, Alexander
    Brooks, David
    Wei, Gu-Yeon
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [6] Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
    Lo, Michael
    Fang, Zhenman
    Wang, Jie
    Zhou, Peipei
    Chang, Mau-Chung Frank
    Cong, Jason
    28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 157 - 166
  • [7] Optimizing Deep Learning Efficiency through Algorithm-Hardware Co-design
    Santoso, Joseph T.
    Wibowo, Mars C.
    Raharjo, Budi
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (10) : 1163 - 1173
  • [8] Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
    Yang, Yifan
    Huang, Qijing
    Wu, Bichen
    Zhang, Tianjun
    Ma, Liang
    Gambardella, Giulio
    Blott, Michaela
    Lavagno, Luciano
    Vissers, Kees
    Wawrzynek, John
    Keutzer, Kurt
    PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 23 - 32
  • [9] CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
    Li, Jiajun
    Louri, Ahmed
    Karanth, Avinash
    Bunescu, Razvan
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 612 - 625
  • [10] ASBNN: Acceleration of Bayesian Convolutional Neural Networks by Algorithm-hardware Co-design
    Fujiwara, Yoshiki
    Takamaeda-Yamazaki, Shinya
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 226 - 233