CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

被引：3

作者：

Zhong, Kai ^{[1
]}

Zeng, Shulin ^{[1
]}

Hou, Wentao ^{[1
,2
]}

Dai, Guohao ^{[3
]}

Zhu, Zhenhua ^{[1
]}

Zhang, Xuecang ^{[4
]}

Xiao, Shihai ^{[4
]}

Yang, Huazhong ^{[1
]}

Wang, Yu ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

[2] Univ Wisconsin, Comp Sci Dept, Madison, WI USA

[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai 200240, Peoples R China

[4] Huawei Technol Co Ltd, 2012 Lab, Shenzhen 518063, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Neural networks; Graph neural networks; Computer architecture; System-on-chip; Quantization (signal); Inference algorithms; Hardware acceleration; Accelerator; graph neural network (GNN); quantization; sampling; PERFORMANCE;

D O I：

10.1109/TCAD.2023.3279302

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a new algorithm of graph embedding, graph neural networks (GNNs) have been widely used in many fields. However, GNN computing has the characteristics of both sparse graph processing and dense neural network, which make it difficult to be deployed efficiently on the existing graph processing accelerators or neural network accelerators. Recently, some GNN accelerators have been proposed, but the following challenges have not been fully solved: 1) the minibatch GNN inference scenario has the potential of software and hardware co-design, which can bring 30% computation amount reduction, and this is not well utilized. Besides, the cost of message flow graph construction is large and may account for more than 50% of the total delay; 2) the feature aggregation has a large amount of data access and relatively small amount of computation, which leads to low on-chip data reuse, only 10% of dense computing; and 3) without the optimization of sparse computing units, simple memory bank and cross bar architecture can easily lead to bank access conflict and load imbalance, reducing the utilization of computing units to less than 60%. In order to solve the above problems, we propose a algorithm-hardware co-design scheme to accelerate GNN inference, which includes three technologies: 1) a reuse-aware sampling method is proposed for minibatch inference scenarios, which reduces 30% of the calculation and improves the on-chip reusability of local data; 2) through the nodewise parallelism-aware quantization, the features and weights are quantized to integers with eight or four bits, which reduces the amount of memory access by at least four times; and 3) an accelerator supporting the above technologies is designed and evaluated, and different operations are supported by the sampling-inference integration architecture. The multibank on-chip memory pool is designed to support data reuse, and edge stream reordering is used to reduce data access conflicts, improving the utilization of computing units by 1.5x . Combined with the above technologies, the experiments show that our design achieves 9.2x speedup and 29x energy efficiency improvement compared with the Deep Graph Library framework running on servers equipped with CPU and GPU.

引用

页码：4883 / 4896

页数：14

共 50 条

[1] Algorithm-hardware Co-design for Deformable Convolution
Huang, Qijing
Wang, Dequan
Gao, Yizhao
Cai, Yaohui
Dong, Zhen
Wu, Bichen
Keutzer, Kurt
Wawrzynek, John
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 48 - 51
[2] Algorithm-hardware Co-design of Attention Mechanism on FPGA Devices
Zhang, Xinyi
Wu, Yawen
Zhou, Peipei
Tang, Xulong
Hu, Jingtong
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)
[3] CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM
Luo, Yixuan
Behnam, Payman
Thorat, Kiran
Liu, Zhuo
Peng, Hongwu
Huang, Shaoyi
Zhou, Shu
Khan, Omer
Tumanov, Alexey
Ding, Caiwen
Geng, Tong
2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 280 - 289
[4] Toolflow for the algorithm-hardware co-design of memristive ANN accelerators
Wabnitz, Malte
Gemmeke, Tobias
Memories - Materials, Devices, Circuits and Systems, 2023, 5
[5] Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference
Tambe, Thierry
Yang, En-Yu
Wan, Zishen
Deng, Yuntian
Reddi, Vijay Janapa
Rush, Alexander
Brooks, David
Wei, Gu-Yeon
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[6] Algorithm-Hardware Co-design for BQSR Acceleration in Genome Analysis ToolKit
Lo, Michael
Fang, Zhenman
Wang, Jie
Zhou, Peipei
Chang, Mau-Chung Frank
Cong, Jason
28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2020, : 157 - 166
[7] Optimizing Deep Learning Efficiency through Algorithm-Hardware Co-design
Santoso, Joseph T.
Wibowo, Mars C.
Raharjo, Budi
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (10) : 1163 - 1173
[8] Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
Yang, Yifan
Huang, Qijing
Wu, Bichen
Zhang, Tianjun
Ma, Liang
Gambardella, Giulio
Blott, Michaela
Lavagno, Luciano
Vissers, Kees
Wawrzynek, John
Keutzer, Kurt
PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 23 - 32
[9] CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
Li, Jiajun
Louri, Ahmed
Karanth, Avinash
Bunescu, Razvan
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 612 - 625
[10] ASBNN: Acceleration of Bayesian Convolutional Neural Networks by Algorithm-hardware Co-design
Fujiwara, Yoshiki
Takamaeda-Yamazaki, Shinya
2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 226 - 233

← 1 2 3 4 5 →