Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

被引：35

作者：

Yu, Yunxuan ^{[1
]}

Zhao, Tiandong ^{[1
]}

Wang, Mingyu ^{[1
]}

Wang, Kun ^{[1
]}

He, Lei ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2020年 / 28卷 / 07期

关键词：

Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;

D O I：

10.1109/TVLSI.2020.2995741

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.

引用

页码：1545 / 1556

页数：12

共 50 条

[31] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
Feng, Gan
Hu, Zuyi
Chen, Song
Wu, Feng
2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
[32] FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network
Nakahara, Hiroki
Sada, Youki
Shimoda, Masayuki
Sayama, Kouki
Jinguji, Akira
Sato, Shimpei
2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 180 - 186
[33] Scalable FPGA-Based Convolutional Neural Network Accelerator for Embedded Systems
Zhao, Jingyuan
Yin, Zhendong
Zhao, Yanlong
Wu, Mingyang
Xu, Mingdong
2019 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2019), 2019, : 36 - 40
[34] An FPGA-Based Computation-Efficient Convolutional Neural Network Accelerator
Archana, V. S.
2022 IEEE INTERNATIONAL POWER AND RENEWABLE ENERGY CONFERENCE, IPRECON, 2022,
[35] Design-Space Exploration of Quantized Transposed Convolutional Neural Networks for FPGA-based Systems-on-Chip
Sestito, Cristian
Perri, Stefania
Stewart, Robert
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 31 - 36
[36] A Scalable FPGA Accelerator for Convolutional Neural Networks
Xu, Ke
Wang, Xiaoyun
Fu, Shihang
Wang, Dong
ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 3 - 14
[37] A survey of graph convolutional networks (GCNs) in FPGA-based accelerators
Procaccini, Marco
Sahebi, Amin
Giorgi, Roberto
JOURNAL OF BIG DATA, 2024, 11 (01)
[38] Implementation of Data-optimized FPGA-based Accelerator for Convolutional Neural Network
Cho, Mannhee
Kim, Youngmin
2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
[39] LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator
Tao, Zhuofu
Wu, Chen
Liang, Yuan
Wang, Kun
He, Lei
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (01)
[40] VHDL Generator for A High Performance Convolutional Neural Network FPGA-Based Accelerator
Hamdan, Muhammad K.
Rover, Diane T.
2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,

← 1 2 3 4 5 →