Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural Networks

被引:0
|
作者
Tang, Enhao [1 ]
Li, Shun [1 ]
Chen, Ruiqi [1 ]
Zhou, Hao [1 ]
Zhang, Haoyang [1 ]
Ma, Yuhanxiao [2 ]
Yu, Jun [1 ]
Wang, Kun [1 ]
机构
[1] Fudan Univ, Sch Microelect, Shanghai, Peoples R China
[2] NYU, New York, NY USA
关键词
Graph Neural Networks; Custom Processor; Hardware Accelerator;
D O I
10.1145/3691636
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate arrays (FPGAs) are an ideal candidate for accelerating graph neural networks (GNNs). However, the FPGA redeployment process is time-consuming when updating or switching between diverse GNN models across different applications. Existing GNN processors eliminate the need for FPGA redeployment when switching between different GNN models. However, adapting matrix multiplication types by switching processing units decreases hardware utilization. In addition, the bandwidth of DDR limits further improvements in hardware performance. This article proposes a highly flexible FPGA-based overlay processor for GNN accelerations. Graph-OPU provides excellent flexibility and programmability for users, as the executable code of GNN models is automatically compiled and reloaded without requiring FPGA redeployment. First, we customize the compiler and instruction sets for the inference process of different GNN models. Second, we customize the datapath and optimize the data format in the microarchitecture to fully leverage the advantages of high bandwidth memory (HBM). Third, we design a unified matrix multiplication to handle both sparse-dense matrix multiplication (SpMM) and general matrix multiplication (GEMM), enhancing Graph-OPU performance. During Graph-OPU execution, the computational units are shared between SpMM and GEMM instead of being switched, which improves the hardware utilization. Finally, we implement a hardware prototype on the Xilinx Alveo U50 and test the mainstream GNN models using various datasets. Experimental results show that Graph-OPU achieves up to 1,654x and 63x speedup, as well as up to 5,305x and 422x energy efficiency boosts, compared to implementations on CPU and GPU, respectively. Graph-OPU outperforms state-of-the-art (SOTA) end-to-end overlay accelerators for GNN, reducing latency by an average of 1.36x and improving energy efficiency by 1.41x on average. Moreover, Graph-OPU exhibits an average 1.45x speed improvement in end-to-end latency over the SOTA GNN processor. Graph-OPU represents an
引用
收藏
页数:33
相关论文
共 50 条
  • [21] Implementing Binarized Neural Network Processor on FPGA-Based Platform
    Lee, Jeahack
    Kim, Hyeonseong
    Kim, Byung-Soo
    Jeon, Seokhun
    Lee, Jung Chul
    Kim, Dong Sun
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 469 - 471
  • [22] Neural network processor for a FPGA-based multiband fluorometer device
    Tabari, Karima
    Boukadoum, Mounir
    Bensaoula, Abdelhak
    Starikov, David
    2006 INTERNATIONAL WORKSHOP ON COMPUTER ARCHITECTURE FOR MACHINE PERCEPTION AND SENSING, 2006, : 198 - +
  • [23] Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search
    Zhang, Jialiang
    Khoram, Soroosh
    Li, Jing
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 207 - 216
  • [24] Explanation-based Graph Neural Networks for Graph Classification
    Seo, Sangwoo
    Jung, Seungjun
    Kim, Changick
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2836 - 2842
  • [25] Graph-based Recommendation using Graph Neural Networks
    Dossena, Marco
    Irwin, Christopher
    Portinale, Luigi
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1769 - 1774
  • [26] Graph-based Dependency Parsing with Graph Neural Networks
    Ji, Tao
    Wu, Yuanbin
    Lan, Man
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2475 - 2485
  • [27] Aurora: A Versatile and Flexible Accelerator for Graph Neural Networks
    Yang, Jiaqi
    Zheng, Hao
    Louri, Ahmed
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 890 - 902
  • [28] Domination based graph neural networks
    Meybodi, Mohsen Alambardar
    Safari, Mahdi
    Davoodijam, Ensieh
    International Journal of Computers and Applications, 2024, 46 (11) : 998 - 1005
  • [29] FPGA-based remote target classification in hyperspectral imaging using multi-graph neural network
    Chellaswamy, C.
    Manjula, M. Muthu
    Ramasubramanian, B.
    Sriram, A.
    MICROPROCESSORS AND MICROSYSTEMS, 2024, 105
  • [30] An FPGA-Based Co-Processor for Spiking Neural Networks with On-Chip STDP-Based Learning
    Nguyen, Thao N. N.
    Veeravalli, Bharadwaj
    Fong, Xuanyao
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2157 - 2161