Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引:35
|
作者
Yu, Yunxuan [1 ]
Zhao, Tiandong [1 ]
Wang, Mingyu [1 ]
Wang, Kun [1 ]
He, Lei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;
D O I
10.1109/TVLSI.2020.2995741
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.
引用
收藏
页码:1545 / 1556
页数:12
相关论文
共 50 条
  • [31] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
    Feng, Gan
    Hu, Zuyi
    Chen, Song
    Wu, Feng
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
  • [32] FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network
    Nakahara, Hiroki
    Sada, Youki
    Shimoda, Masayuki
    Sayama, Kouki
    Jinguji, Akira
    Sato, Shimpei
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 180 - 186
  • [33] Scalable FPGA-Based Convolutional Neural Network Accelerator for Embedded Systems
    Zhao, Jingyuan
    Yin, Zhendong
    Zhao, Yanlong
    Wu, Mingyang
    Xu, Mingdong
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2019), 2019, : 36 - 40
  • [34] An FPGA-Based Computation-Efficient Convolutional Neural Network Accelerator
    Archana, V. S.
    2022 IEEE INTERNATIONAL POWER AND RENEWABLE ENERGY CONFERENCE, IPRECON, 2022,
  • [35] Design-Space Exploration of Quantized Transposed Convolutional Neural Networks for FPGA-based Systems-on-Chip
    Sestito, Cristian
    Perri, Stefania
    Stewart, Robert
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 31 - 36
  • [36] A Scalable FPGA Accelerator for Convolutional Neural Networks
    Xu, Ke
    Wang, Xiaoyun
    Fu, Shihang
    Wang, Dong
    ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 3 - 14
  • [37] A survey of graph convolutional networks (GCNs) in FPGA-based accelerators
    Procaccini, Marco
    Sahebi, Amin
    Giorgi, Roberto
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [38] Implementation of Data-optimized FPGA-based Accelerator for Convolutional Neural Network
    Cho, Mannhee
    Kim, Youngmin
    2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [39] LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator
    Tao, Zhuofu
    Wu, Chen
    Liang, Yuan
    Wang, Kun
    He, Lei
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (01)
  • [40] VHDL Generator for A High Performance Convolutional Neural Network FPGA-Based Accelerator
    Hamdan, Muhammad K.
    Rover, Diane T.
    2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,