Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引：35

作者：

Yu, Yunxuan ^{[1
]}

Zhao, Tiandong ^{[1
]}

Wang, Mingyu ^{[1
]}

Wang, Kun ^{[1
]}

He, Lei ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2020年 / 28卷 / 07期

关键词：

Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;

D O I：

10.1109/TVLSI.2020.2995741

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.

引用

页码：1545 / 1556

页数：12

共 50 条

[1] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
Yu, Yunxuan
Wu, Chen
Zhao, Tiandong
Wang, Kun
He, Lei
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
[2] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
Wu, Tsung-Hsi
Shu, Chang
Liu, Tsung-Te
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186
[3] FPGA-based Accelerator for Losslessly Quantized Convolutional Neural Networks
Sit, Mankit
Kazami, Ryosuke
Amano, Hideharu
2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 295 - 298
[4] An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks
Zhou, Yongmei
Jiang, Jingfei
PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 829 - 832
[5] Composite FPGA-based Accelerator for Deep Convolutional Neural Networks
HuanZhang
YuanYang
YangXiao
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
[6] A FPGA-based Hardware Accelerator for Multiple Convolutional Neural Networks
Yao, Yuchen
Duan, Qinghua
Zhang, Zhiqian
Gao, Jiabao
Wang, Jian
Yang, Meng
Tao, Xinxuan
Lai, Jinmei
2018 14TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2018, : 1075 - 1077
[7] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
Yu, Yunxuan
Zhao, Tiandong
Wang, Kun
He, Lei
2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
[8] SpCNA: An FPGA-based Accelerator for Point Cloud Convolutional Neural Networks
Zhou, Gong-Lang
Guo, Kaiyuan
Chen, Xiang
Leung, Kwok Wa
2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 211 - 211
[9] Optimization of Energy Efficiency for FPGA-Based Convolutional Neural Networks Accelerator
Tang, Yongming
Dai, Rongshi
Xie, Yi
2020 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND ARTIFICIAL INTELLIGENCE (CCEAI 2020), 2020, 1487
[10] FPGA-based Accelerator for Deep Convolutional Neural Networks for the SPARK Environment
Morcel, Raghid
Ezzeddine, Mazen
Akkary, Haitham
2016 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2016, : 126 - 133

← 1 2 3 4 5 →