FPGA Implementation of MobileNetV2 CNN Model Using Semi-Streaming Architecture for Low Power Inference Applications

被引:4
|
作者
Shaydyuk, Nazariy K. [1 ]
John, Eugene B. [1 ]
机构
[1] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX 78249 USA
关键词
semi-streaming architecture; streaming architecture; CNN; hardware accelerator; mobilenetv2; FPGA; SoC; embedded systems; inference; NEURAL-NETWORK;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular hardware target choice for their deployment, splitting into two main implementation categories: streaming hardware architectures and single computation engine design approaches. The streaming hardware architectures generally require implementing every layer as a discrete processing unit, and are suitable for smaller software models that could fit in their unfolded versions into resource-constrained targets. On the other hand, single computation engines can be scaled to fit into a device to execute CNN models of different sizes and complexities, however, the achievable performance of one-size-fits-all implementations may vary across CNNs with different workload attributes leading to inefficient utilization of hardware resources. By combing the advantages of both of the above methods, this work proposes a new design paradigm called semi-streaming architecture, where layer-specialized configurable engines are used for network realization. As a proof of concept this paper presents a set of five layer-specialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model. The engines are chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers: normalized addition of residuals, depthwise, pointwise (expansion and projection), and standard 2D convolution layers capable of delivering 5.4GOp/s, 16GOp/s, 27.2GOp/s, 27.2GOp/s and 89.6GOp/s, respectively, with the overall energy efficiency of 5.32GOp/s/W at a 100MHz system clock, requiring total power of 6.2W on a XCZU7EV SoC FPGA.
引用
收藏
页码:160 / 167
页数:8
相关论文
共 19 条
  • [1] FACEMASK DETECTION USING CONVOLUTIONAL NEURAL NETWORKS (CNN) WITH MOBILENETV2 ARCHITECTURE
    Jubilson, Ajith
    Madhurya, Ch
    INTERNATIONAL JOURNAL OF EARLY CHILDHOOD SPECIAL EDUCATION, 2022, 14 (01) : 2173 - 2183
  • [2] A High Throughput MobileNetV2 FPGA implementation based on a Flexible Architecture for Depthwise Separable Convolution
    Knapheide, Justin
    Stabernack, Benno
    Kuhnke, Maximilian
    2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2020, : 277 - 283
  • [3] A Low-Power Implementation of arctangent function for Communication Applications using FPGA
    Saber, M.
    Jitsumatsu, Yutaka
    Kohda, T.
    IWSDA'09: PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON SIGNAL DESIGN AND ITS APPLICATIONS IN COMMUNICATIONS, 2009, : 60 - 63
  • [4] A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture
    Jung, Sangwoo
    Lee, Jaehyun
    Park, Dahoon
    Lee, Youngjoo
    Yoon, Jong-Hyeok
    Kung, Jaeha
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, : 1 - 14
  • [5] A real time, low latency, FPGA implementation of the 2-D discrete wavelet transformation for streaming image applications
    Benderli, O
    Tekmen, YÇ
    Ismailoglu, N
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEM DESIGN, PROCEEDINGS, 2003, : 384 - 389
  • [6] A Low-Latency Streaming On-Device Automatic Speech Recognition System Using a CNN Acoustic Model on FPGA and a Language Model on Smartphone
    Park, Jaehyun
    Noh, Hyeonkyu
    Nam, Hyunjoon
    Lee, Won-Cheol
    Park, Hong-June
    ELECTRONICS, 2022, 11 (12)
  • [7] Implementation of zero voltage switched SEPIC/ZETA bidirectional converter for low power applications using FPGA
    Mahendran, Venmathi
    Ramabadran, Ramaprabha
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (01) : 319 - 336
  • [8] A Low Power Implementation of PSK Modems in FPGA with Reconfigurable Filter and Digital NCO using PR for SDR and CR Applications
    Kumar, Arun K. A.
    2012 INTERNATIONAL CONFERENCE ON GREEN TECHNOLOGIES (ICGT), 2012, : 192 - 197
  • [9] Low-Power High-Performance Multitransform Architecture Using Run-Time Reconfigurable Adder for FPGA and ASIC Implementation
    Sivanandam, K.
    Kumar, P.
    SYSTEM AND ARCHITECTURE, CSI 2015, 2018, 732 : 63 - 72
  • [10] VLSI-Design and FPGA-Implementation of GMSK-Demodulator Architecture Using CORDIC Engine for Low-Power Application
    Kumar, Lalit
    Mittal, Deepak Kumar
    Shrestha, Rahul
    2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,