Efficient depthwise separable convolution accelerator for classification and UAV object detection

被引：22

作者：

Li, Guoqing ^{[1
]}

Zhang, Jingwei ^{[1
]}

Zhang, Meng ^{[1
]}

Wu, Ruixia ^{[2
]}

Cao, Xinye ^{[1
]}

Liu, Wenzhao ^{[1
]}

机构：

[1] Southeast Univ, Natl ASIC Engn Technol Res Ctr, Sch Elect Sci & Engn, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Microelect, Nanjing 210096, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 490卷

基金：

国家重点研发计划;

关键词：

Depthwise separable convolutions; Convolutional neural networks; Hardware accelerator; FPGA; Object detection; DEEP NEURAL-NETWORKS; FPGA IMPLEMENTATION; CNN ACCELERATOR;

D O I：

10.1016/j.neucom.2022.02.071

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Depthwise separable convolutions (DSC) have been widely deployed in lightweight convolutional neural networks due to high efficiency. But the acceleration performance of the Graphics Processing Unit for DSC was not as well as in theory. In this paper, some approaches were proposed for accelerating DSC based on Field-Programmable Gate Array (FPGA). For the preceding layers, S2C (spatial to channel) was proposed to accelerate computing and improve the utilization rate of computational resources and bandwidth. An efficient SharePE was proposed to accelerate the DSC, which can improve the efficiency of the computing resource. The regulable parallelism approach was proposed to compute efficiently the different pointwise convolutional layers. P2D&D2P approach is proposed to reduce the external memory access. For the entire accelerating system, the pre-load workflow was proposed to reduce the waiting time of the accelerator between two images. We demonstrated our approaches on the SkyNet using the Ultra96V2 development board. Results indicated that our proposed accelerator obtained 80.030 frames per second and 0.072 Joule per image for UAV object detection, which achieved the state-of-the-art results for SkyNet. Besides, the MobileNetV2 model was implemented on a larger XC7Z100 FPGA, and the results showed our accelerator classified each picture from ImageNet in 2.69 ms. Code is available at https://github.co m/AlLearnerLi/DAC-SDC-2020-SEUer. (C) 2022 Published by Elsevier B.V.

引用

页码：1 / 16

页数：16

共 50 条

[1] A digital signal processor-efficient accelerator for depthwise separable convolution
Li, Xueming
Huang, Hongmin
Liu, Yuan
Hu, Xianghong
Xiong, Xiaoming
[J]. ELECTRONICS LETTERS, 2022, 58 (07) : 271 - 273
[2] A Depthwise Separable Convolution Architecture for CNN Accelerator
Srivastava, Harsh
Sarawadekar, Kishor
[J]. PROCEEDINGS OF 2020 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON 2020), 2020, : 1 - 5
[3] A CNN Accelerator on FPGA Using Depthwise Separable Convolution
Bai, Lin
Zhao, Yiming
Huang, Xinming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) : 1415 - 1419
[4] The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution
Lin, Hung-Ju
Shen, Chung-An
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (06) : 3547 - 3569
[5] The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution
Hung-Ju Lin
Chung-An Shen
[J]. Circuits, Systems, and Signal Processing, 2022, 41 : 3547 - 3569
[6] Depthwise separable convolution architectures for plant disease classification
Kamal, K. C.
Yin, Zhendong
Wu, Mingyang
Wu, Zhilu
[J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 165
[7] Depthwise grouped convolution for object detection
Liao, Yongwei
Lu, Siwei
Yang, Zhenguo
Liu, Wenyin
[J]. MACHINE VISION AND APPLICATIONS, 2021, 32 (06)
[8] A Split Edge Computing Doable Network for Object Detection base on Depthwise Separable Convolution
Wen, Qingfeng
Guo, Wei
Li, Longji
Fan, Boyu
Shi, Zaifeng
[J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[9] Depthwise grouped convolution for object detection
Yongwei Liao
Siwei Lu
Zhenguo Yang
Wenyin Liu
[J]. Machine Vision and Applications, 2021, 32
[10] MLogNet: A Logarithmic Quantization-Based Accelerator for Depthwise Separable Convolution
Choi, Jooyeon
Sim, Hyeonuk
Oh, Sangyun
Lee, Sugil
Lee, Jongeun
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5220 - 5231

← 1 2 3 4 5 →