Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引:35
|
作者
Yu, Yunxuan [1 ]
Zhao, Tiandong [1 ]
Wang, Mingyu [1 ]
Wang, Kun [1 ]
He, Lei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;
D O I
10.1109/TVLSI.2020.2995741
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.
引用
收藏
页码:1545 / 1556
页数:12
相关论文
共 50 条
  • [1] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
    Yu, Yunxuan
    Wu, Chen
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
  • [2] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
    Wu, Tsung-Hsi
    Shu, Chang
    Liu, Tsung-Te
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186
  • [3] FPGA-based Accelerator for Losslessly Quantized Convolutional Neural Networks
    Sit, Mankit
    Kazami, Ryosuke
    Amano, Hideharu
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 295 - 298
  • [4] An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks
    Zhou, Yongmei
    Jiang, Jingfei
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 829 - 832
  • [5] Composite FPGA-based Accelerator for Deep Convolutional Neural Networks
    HuanZhang
    YuanYang
    YangXiao
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
  • [6] A FPGA-based Hardware Accelerator for Multiple Convolutional Neural Networks
    Yao, Yuchen
    Duan, Qinghua
    Zhang, Zhiqian
    Gao, Jiabao
    Wang, Jian
    Yang, Meng
    Tao, Xinxuan
    Lai, Jinmei
    2018 14TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2018, : 1075 - 1077
  • [7] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
    Yu, Yunxuan
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
  • [8] SpCNA: An FPGA-based Accelerator for Point Cloud Convolutional Neural Networks
    Zhou, Gong-Lang
    Guo, Kaiyuan
    Chen, Xiang
    Leung, Kwok Wa
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 211 - 211
  • [9] Optimization of Energy Efficiency for FPGA-Based Convolutional Neural Networks Accelerator
    Tang, Yongming
    Dai, Rongshi
    Xie, Yi
    2020 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND ARTIFICIAL INTELLIGENCE (CCEAI 2020), 2020, 1487
  • [10] FPGA-based Accelerator for Deep Convolutional Neural Networks for the SPARK Environment
    Morcel, Raghid
    Ezzeddine, Mazen
    Akkary, Haitham
    2016 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2016, : 126 - 133