Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural Networks

被引:0
|
作者
Tang, Enhao [1 ]
Li, Shun [1 ]
Chen, Ruiqi [1 ]
Zhou, Hao [1 ]
Zhang, Haoyang [1 ]
Ma, Yuhanxiao [2 ]
Yu, Jun [1 ]
Wang, Kun [1 ]
机构
[1] Fudan Univ, Sch Microelect, Shanghai, Peoples R China
[2] NYU, New York, NY USA
关键词
Graph Neural Networks; Custom Processor; Hardware Accelerator;
D O I
10.1145/3691636
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate arrays (FPGAs) are an ideal candidate for accelerating graph neural networks (GNNs). However, the FPGA redeployment process is time-consuming when updating or switching between diverse GNN models across different applications. Existing GNN processors eliminate the need for FPGA redeployment when switching between different GNN models. However, adapting matrix multiplication types by switching processing units decreases hardware utilization. In addition, the bandwidth of DDR limits further improvements in hardware performance. This article proposes a highly flexible FPGA-based overlay processor for GNN accelerations. Graph-OPU provides excellent flexibility and programmability for users, as the executable code of GNN models is automatically compiled and reloaded without requiring FPGA redeployment. First, we customize the compiler and instruction sets for the inference process of different GNN models. Second, we customize the datapath and optimize the data format in the microarchitecture to fully leverage the advantages of high bandwidth memory (HBM). Third, we design a unified matrix multiplication to handle both sparse-dense matrix multiplication (SpMM) and general matrix multiplication (GEMM), enhancing Graph-OPU performance. During Graph-OPU execution, the computational units are shared between SpMM and GEMM instead of being switched, which improves the hardware utilization. Finally, we implement a hardware prototype on the Xilinx Alveo U50 and test the mainstream GNN models using various datasets. Experimental results show that Graph-OPU achieves up to 1,654x and 63x speedup, as well as up to 5,305x and 422x energy efficiency boosts, compared to implementations on CPU and GPU, respectively. Graph-OPU outperforms state-of-the-art (SOTA) end-to-end overlay accelerators for GNN, reducing latency by an average of 1.36x and improving energy efficiency by 1.41x on average. Moreover, Graph-OPU exhibits an average 1.45x speed improvement in end-to-end latency over the SOTA GNN processor. Graph-OPU represents an
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks
    Chen, Ruiqi
    Zhang, Haoyang
    Li, Shun
    Tang, Enhao
    Yu, Jun
    Wang, Kun
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 228 - 234
  • [2] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
    Yu, Yunxuan
    Wu, Chen
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
  • [3] FET-OPU: A Flexible and Efficient FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Wang, Hongji
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [4] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
    Yu, Yunxuan
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
  • [5] Transformer-OPU: An FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 222 - 222
  • [6] MP-OPU: A Mixed Precision FPGA-based Overlay Processor for Convolutional Neural Networks
    Wu, Chen
    Zhuang, Jinming
    Wang, Kun
    He, Lei
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 33 - 37
  • [7] LTrans-OPU: A Low-Latency FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Zhang, Manting
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 283 - 287
  • [8] Customizable FPGA-based Accelerator for Binarized Graph Neural Networks
    Wang, Ziwei
    Que, Zhiqiang
    Luk, Wayne
    Fan, Hongxiang
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1968 - 1972
  • [9] Customizable FPGA-based Accelerator for Binarized Graph Neural Networks
    Wang, Ziwei
    Que, Zhiqiang
    Luk, Wayne
    Fan, Hongxiang
    Proceedings - IEEE International Symposium on Circuits and Systems, 2022, 2022-May : 1968 - 1972
  • [10] OVERVIEW OF A FPGA-BASED OVERLAY PROCESSOR
    Yu, Yunxuan
    Wu, Chen
    Shi, Xiao
    He, Lei
    2019 CHINA SEMICONDUCTOR TECHNOLOGY INTERNATIONAL CONFERENCE (CSTIC), 2019,