Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引：35

作者：

Yu, Yunxuan ^{[1
]}

Zhao, Tiandong ^{[1
]}

Wang, Mingyu ^{[1
]}

Wang, Kun ^{[1
]}

He, Lei ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2020年 / 28卷 / 07期

关键词：

Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;

D O I：

10.1109/TVLSI.2020.2995741

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.

引用

页码：1545 / 1556

页数：12

共 50 条

[21] An FPGA-Based Processor for Training Convolutional Neural Networks
Liu, Zhiqiang
Dou, Yong
Jiang, Jingfei
Wang, Qiang
Chow, Paul
2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 207 - 210
[22] An Efficient FPGA-Based Architecture for Convolutional Neural Networks
Hwang, Wen-Jyi
Jhang, Yun-Jie
Tai, Tsung-Ming
2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2017, : 582 - 588
[23] A survey of FPGA-based accelerators for convolutional neural networks
Sparsh Mittal
Neural Computing and Applications, 2020, 32 : 1109 - 1139
[24] A survey of FPGA-based accelerators for convolutional neural networks
Mittal, Sparsh
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (04): : 1109 - 1139
[25] Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity
Liu Qinrang
Liu Chongyang
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (06) : 1368 - 1374
[26] High Energy Efficiency FPGA-based Accelerator for Convolutional Neural Networks Using Weight Combination
Shu, Chenghao
Pang, Wei
Liu, Hao
Lu, Shengli
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 578 - 582
[27] A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm
Huang, Y.
Shen, J.
Wang, Z.
Wen, M.
Zhang, C.
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING (ICECC), 2018, 1026
[28] FPGA-based Accelerator for Convolutional Neural Network Application in Mobile Robotics
Mazzetto, Lucas F. R.
Castanho, Jose E. C.
2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 433 - 438
[29] A FPGA-based Accelerator of Convolutional Neural Network for Face Feature Extraction
Ding, Ru
Su, Guangda
Bai, Guoqiang
Xu, Wei
Su, Nan
Wu, Xingjun
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
[30] FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer
Li T.
Zhang F.
Wang S.
Cao W.
Chen L.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (06): : 2663 - 2672

← 1 2 3 4 5 →