Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引:35
|
作者
Yu, Yunxuan [1 ]
Zhao, Tiandong [1 ]
Wang, Mingyu [1 ]
Wang, Kun [1 ]
He, Lei [1 ]
机构
[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;
D O I
10.1109/TVLSI.2020.2995741
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.
引用
收藏
页码:1545 / 1556
页数:12
相关论文
共 50 条
  • [21] An FPGA-Based Processor for Training Convolutional Neural Networks
    Liu, Zhiqiang
    Dou, Yong
    Jiang, Jingfei
    Wang, Qiang
    Chow, Paul
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 207 - 210
  • [22] An Efficient FPGA-Based Architecture for Convolutional Neural Networks
    Hwang, Wen-Jyi
    Jhang, Yun-Jie
    Tai, Tsung-Ming
    2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2017, : 582 - 588
  • [23] A survey of FPGA-based accelerators for convolutional neural networks
    Sparsh Mittal
    Neural Computing and Applications, 2020, 32 : 1109 - 1139
  • [24] A survey of FPGA-based accelerators for convolutional neural networks
    Mittal, Sparsh
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (04): : 1109 - 1139
  • [25] Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity
    Liu Qinrang
    Liu Chongyang
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (06) : 1368 - 1374
  • [26] High Energy Efficiency FPGA-based Accelerator for Convolutional Neural Networks Using Weight Combination
    Shu, Chenghao
    Pang, Wei
    Liu, Hao
    Lu, Shengli
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 578 - 582
  • [27] A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm
    Huang, Y.
    Shen, J.
    Wang, Z.
    Wen, M.
    Zhang, C.
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING (ICECC), 2018, 1026
  • [28] FPGA-based Accelerator for Convolutional Neural Network Application in Mobile Robotics
    Mazzetto, Lucas F. R.
    Castanho, Jose E. C.
    2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 433 - 438
  • [29] A FPGA-based Accelerator of Convolutional Neural Network for Face Feature Extraction
    Ding, Ru
    Su, Guangda
    Bai, Guoqiang
    Xu, Wei
    Su, Nan
    Wu, Xingjun
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
  • [30] FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer
    Li T.
    Zhang F.
    Wang S.
    Cao W.
    Chen L.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (06): : 2663 - 2672