OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

被引:92
|
作者
Yu, Yunxuan [1 ]
Wu, Chen [1 ]
Zhao, Tiandong [1 ]
Wang, Kun [1 ]
He, Lei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Convolutional neural network (CNN) overlay processor; field-programmable gate array (FPGA) acceleration; hardware-software codesign;
D O I
10.1109/TVLSI.2019.2939726
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows 5.35x better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is 2.9x faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.
引用
收藏
页码:35 / 47
页数:13
相关论文
共 50 条
  • [41] An FPGA-based Accelerator Platform Implements for Convolutional Neural Network
    Meng, Xiao
    Yu, Lixin
    Qin, Zhiyong
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 25 - 28
  • [42] Modulation recognition using an FPGA-based convolutional neural network
    Liu, Xueyuan
    Shang, Jing
    Leong, Philip H. W.
    Liu, Cheng
    2019 22ND INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2019), 2019, : 3165 - 3170
  • [43] High Energy Efficiency FPGA-based Accelerator for Convolutional Neural Networks Using Weight Combination
    Shu, Chenghao
    Pang, Wei
    Liu, Hao
    Lu, Shengli
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 578 - 582
  • [44] Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity
    Liu Qinrang
    Liu Chongyang
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (06) : 1368 - 1374
  • [45] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
    DiCecco, Roberto
    Sun, Lin
    Chow, Paul
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
  • [46] A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm
    Huang, Y.
    Shen, J.
    Wang, Z.
    Wen, M.
    Zhang, C.
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING (ICECC), 2018, 1026
  • [47] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
    Feng, Gan
    Hu, Zuyi
    Chen, Song
    Wu, Feng
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
  • [48] Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights
    Duan, Yunzhi
    Li, Shuai
    Zhang, Ruipeng
    Wang, Qi
    Chen, Jienan
    Sobelman, Gerald E.
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [49] A survey of graph convolutional networks (GCNs) in FPGA-based accelerators
    Procaccini, Marco
    Sahebi, Amin
    Giorgi, Roberto
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [50] An FPGA-Based Co-Processor for Spiking Neural Networks with On-Chip STDP-Based Learning
    Nguyen, Thao N. N.
    Veeravalli, Bharadwaj
    Fong, Xuanyao
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2157 - 2161