FPGA Implementation of MobileNetV2 CNN Model Using Semi-Streaming Architecture for Low Power Inference Applications

被引：4

作者：

Shaydyuk, Nazariy K. ^{[1
]}

John, Eugene B. ^{[1
]}

机构：

[1] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA

来源：

2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020) | 2020年

关键词：

semi-streaming architecture; streaming architecture; CNN; hardware accelerator; mobilenetv2; FPGA; SoC; embedded systems; inference; NEURAL-NETWORK;

D O I：

10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00046

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular hardware target choice for their deployment, splitting into two main implementation categories: streaming hardware architectures and single computation engine design approaches. The streaming hardware architectures generally require implementing every layer as a discrete processing unit, and are suitable for smaller software models that could fit in their unfolded versions into resource-constrained targets. On the other hand, single computation engines can be scaled to fit into a device to execute CNN models of different sizes and complexities, however, the achievable performance of one-size-fits-all implementations may vary across CNNs with different workload attributes leading to inefficient utilization of hardware resources. By combing the advantages of both of the above methods, this work proposes a new design paradigm called semi-streaming architecture, where layer-specialized configurable engines are used for network realization. As a proof of concept this paper presents a set of five layer-specialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model. The engines are chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers: normalized addition of residuals, depthwise, pointwise (expansion and projection), and standard 2D convolution layers capable of delivering 5.4GOp/s, 16GOp/s, 27.2GOp/s, 27.2GOp/s and 89.6GOp/s, respectively, with the overall energy efficiency of 5.32GOp/s/W at a 100MHz system clock, requiring total power of 6.2W on a XCZU7EV SoC FPGA.

引用

页码：160 / 167

页数：8

共 19 条

[1] FACEMASK DETECTION USING CONVOLUTIONAL NEURAL NETWORKS (CNN) WITH MOBILENETV2 ARCHITECTURE
Jubilson, Ajith
Madhurya, Ch
INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (01) : 2173 - 2183
[2] A High Throughput MobileNetV2 FPGA implementation based on a Flexible Architecture for Depthwise Separable Convolution
Knapheide, Justin
Stabernack, Benno
Kuhnke, Maximilian
2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 277 - 283
[3] A Low-Power Implementation of arctangent function for Communication Applications using FPGA
Saber, M.
Jitsumatsu, Yutaka
Kohda, T.
IWSDA'09: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON SIGNAL DESIGN AND ITS APPLICATIONS IN COMMUNICATIONS, 2009, : 60 - 63
[4] A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture
Jung, Sangwoo
Lee, Jaehyun
Park, Dahoon
Lee, Youngjoo
Yoon, Jong-Hyeok
Kung, Jaeha
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, : 1 - 14
[5] A real time, low latency, FPGA implementation of the 2-D discrete wavelet transformation for streaming image applications
Benderli, O
Tekmen, YÇ
Ismailoglu, N
EUROMICRO SYMPOSIUM ON DIGITAL SYSTEM DESIGN, PROCEEDINGS, 2003, : 384 - 389
[6] A Low-Latency Streaming On-Device Automatic Speech Recognition System Using a CNN Acoustic Model on FPGA and a Language Model on Smartphone
Park, Jaehyun
Noh, Hyeonkyu
Nam, Hyunjoon
Lee, Won-Cheol
Park, Hong-June
ELECTRONICS, 2022, 11 (12)
[7] Implementation of zero voltage switched SEPIC/ZETA bidirectional converter for low power applications using FPGA
Mahendran, Venmathi
Ramabadran, Ramaprabha
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (01) : 319 - 336
[8] A Low Power Implementation of PSK Modems in FPGA with Reconfigurable Filter and Digital NCO using PR for SDR and CR Applications
Kumar, Arun K. A.
2012 INTERNATIONAL CONFERENCE ON GREEN TECHNOLOGIES (ICGT), 2012, : 192 - 197
[9] Low-Power High-Performance Multitransform Architecture Using Run-Time Reconfigurable Adder for FPGA and ASIC Implementation
Sivanandam, K.
Kumar, P.
SYSTEM AND ARCHITECTURE, CSI 2015, 2018, 732 : 63 - 72
[10] VLSI-Design and FPGA-Implementation of GMSK-Demodulator Architecture Using CORDIC Engine for Low-Power Application
Kumar, Lalit
Mittal, Deepak Kumar
Shrestha, Rahul
2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,

← 1 2 →