OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

被引:92
|
作者
Yu, Yunxuan [1 ]
Wu, Chen [1 ]
Zhao, Tiandong [1 ]
Wang, Kun [1 ]
He, Lei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Convolutional neural network (CNN) overlay processor; field-programmable gate array (FPGA) acceleration; hardware-software codesign;
D O I
10.1109/TVLSI.2019.2939726
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows 5.35x better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is 2.9x faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.
引用
收藏
页码:35 / 47
页数:13
相关论文
共 50 条
  • [1] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
    Yu, Yunxuan
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
  • [2] MP-OPU: A Mixed Precision FPGA-based Overlay Processor for Convolutional Neural Networks
    Wu, Chen
    Zhuang, Jinming
    Wang, Kun
    He, Lei
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 33 - 37
  • [3] Transformer-OPU: An FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 222 - 222
  • [4] Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks
    Chen, Ruiqi
    Zhang, Haoyang
    Li, Shun
    Tang, Enhao
    Yu, Jun
    Wang, Kun
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 228 - 234
  • [5] Graph-OPU: A Highly Flexible FPGA-Based Overlay Processor for Graph Neural Networks
    Tang, Enhao
    Li, Shun
    Chen, Ruiqi
    Zhou, Hao
    Zhang, Haoyang
    Ma, Yuhanxiao
    Yu, Jun
    Wang, Kun
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2024, 17 (04)
  • [6] FET-OPU: A Flexible and Efficient FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Wang, Hongji
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [7] An FPGA-Based Processor for Training Convolutional Neural Networks
    Liu, Zhiqiang
    Dou, Yong
    Jiang, Jingfei
    Wang, Qiang
    Chow, Paul
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 207 - 210
  • [8] LTrans-OPU: A Low-Latency FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Zhang, Manting
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2023, : 283 - 287
  • [9] CNP: AN FPGA-BASED PROCESSOR FOR CONVOLUTIONAL NETWORKS
    Farabet, Clement
    Poulet, Cyril
    Han, Jefferson Y.
    LeCun, Yann
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 32 - +
  • [10] Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks
    Yu, Yunxuan
    Zhao, Tiandong
    Wang, Mingyu
    Wang, Kun
    He, Lei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (07) : 1545 - 1556