OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

被引：92

作者：

Yu, Yunxuan ^{[1
]}

Wu, Chen ^{[1
]}

Zhao, Tiandong ^{[1
]}

Wang, Kun ^{[1
]}

He, Lei ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2020年 / 28卷 / 01期

关键词：

Convolutional neural network (CNN) overlay processor; field-programmable gate array (FPGA) acceleration; hardware-software codesign;

D O I：

10.1109/TVLSI.2019.2939726

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows 5.35x better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is 2.9x faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.

引用

页码：35 / 47

页数：13

共 50 条

[41] An FPGA-based Accelerator Platform Implements for Convolutional Neural Network
Meng, Xiao
Yu, Lixin
Qin, Zhiyong
2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 25 - 28
[42] Modulation recognition using an FPGA-based convolutional neural network
Liu, Xueyuan
Shang, Jing
Leong, Philip H. W.
Liu, Cheng
2019 22ND INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2019), 2019, : 3165 - 3170
[43] High Energy Efficiency FPGA-based Accelerator for Convolutional Neural Networks Using Weight Combination
Shu, Chenghao
Pang, Wei
Liu, Hao
Lu, Shengli
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 578 - 582
[44] Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity
Liu Qinrang
Liu Chongyang
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (06) : 1368 - 1374
[45] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
DiCecco, Roberto
Sun, Lin
Chow, Paul
2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
[46] A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm
Huang, Y.
Shen, J.
Wang, Z.
Wen, M.
Zhang, C.
2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING (ICECC), 2018, 1026
[47] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
Feng, Gan
Hu, Zuyi
Chen, Song
Wu, Feng
2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
[48] Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights
Duan, Yunzhi
Li, Shuai
Zhang, Ruipeng
Wang, Qi
Chen, Jienan
Sobelman, Gerald E.
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[49] A survey of graph convolutional networks (GCNs) in FPGA-based accelerators
Procaccini, Marco
Sahebi, Amin
Giorgi, Roberto
JOURNAL OF BIG DATA, 2024, 11 (01)
[50] An FPGA-Based Co-Processor for Spiking Neural Networks with On-Chip STDP-Based Learning
Nguyen, Thao N. N.
Veeravalli, Bharadwaj
Fong, Xuanyao
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2157 - 2161

← 1 2 3 4 5 →