Efficient depthwise separable convolution accelerator for classification and UAV object detection

被引:22
|
作者
Li, Guoqing [1 ]
Zhang, Jingwei [1 ]
Zhang, Meng [1 ]
Wu, Ruixia [2 ]
Cao, Xinye [1 ]
Liu, Wenzhao [1 ]
机构
[1] Southeast Univ, Natl ASIC Engn Technol Res Ctr, Sch Elect Sci & Engn, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Microelect, Nanjing 210096, Peoples R China
基金
国家重点研发计划;
关键词
Depthwise separable convolutions; Convolutional neural networks; Hardware accelerator; FPGA; Object detection; DEEP NEURAL-NETWORKS; FPGA IMPLEMENTATION; CNN ACCELERATOR;
D O I
10.1016/j.neucom.2022.02.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depthwise separable convolutions (DSC) have been widely deployed in lightweight convolutional neural networks due to high efficiency. But the acceleration performance of the Graphics Processing Unit for DSC was not as well as in theory. In this paper, some approaches were proposed for accelerating DSC based on Field-Programmable Gate Array (FPGA). For the preceding layers, S2C (spatial to channel) was proposed to accelerate computing and improve the utilization rate of computational resources and bandwidth. An efficient SharePE was proposed to accelerate the DSC, which can improve the efficiency of the computing resource. The regulable parallelism approach was proposed to compute efficiently the different pointwise convolutional layers. P2D&D2P approach is proposed to reduce the external memory access. For the entire accelerating system, the pre-load workflow was proposed to reduce the waiting time of the accelerator between two images. We demonstrated our approaches on the SkyNet using the Ultra96V2 development board. Results indicated that our proposed accelerator obtained 80.030 frames per second and 0.072 Joule per image for UAV object detection, which achieved the state-of-the-art results for SkyNet. Besides, the MobileNetV2 model was implemented on a larger XC7Z100 FPGA, and the results showed our accelerator classified each picture from ImageNet in 2.69 ms. Code is available at https://github.co m/AlLearnerLi/DAC-SDC-2020-SEUer. (C) 2022 Published by Elsevier B.V.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] A digital signal processor-efficient accelerator for depthwise separable convolution
    Li, Xueming
    Huang, Hongmin
    Liu, Yuan
    Hu, Xianghong
    Xiong, Xiaoming
    [J]. ELECTRONICS LETTERS, 2022, 58 (07) : 271 - 273
  • [2] A Depthwise Separable Convolution Architecture for CNN Accelerator
    Srivastava, Harsh
    Sarawadekar, Kishor
    [J]. PROCEEDINGS OF 2020 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON 2020), 2020, : 1 - 5
  • [3] A CNN Accelerator on FPGA Using Depthwise Separable Convolution
    Bai, Lin
    Zhao, Yiming
    Huang, Xinming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (10) : 1415 - 1419
  • [4] The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution
    Lin, Hung-Ju
    Shen, Chung-An
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (06) : 3547 - 3569
  • [5] The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution
    Hung-Ju Lin
    Chung-An Shen
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 3547 - 3569
  • [6] Depthwise separable convolution architectures for plant disease classification
    Kamal, K. C.
    Yin, Zhendong
    Wu, Mingyang
    Wu, Zhilu
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 165
  • [7] Depthwise grouped convolution for object detection
    Liao, Yongwei
    Lu, Siwei
    Yang, Zhenguo
    Liu, Wenyin
    [J]. MACHINE VISION AND APPLICATIONS, 2021, 32 (06)
  • [8] A Split Edge Computing Doable Network for Object Detection base on Depthwise Separable Convolution
    Wen, Qingfeng
    Guo, Wei
    Li, Longji
    Fan, Boyu
    Shi, Zaifeng
    [J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [9] Depthwise grouped convolution for object detection
    Yongwei Liao
    Siwei Lu
    Zhenguo Yang
    Wenyin Liu
    [J]. Machine Vision and Applications, 2021, 32
  • [10] MLogNet: A Logarithmic Quantization-Based Accelerator for Depthwise Separable Convolution
    Choi, Jooyeon
    Sim, Hyeonuk
    Oh, Sangyun
    Lee, Sugil
    Lee, Jongeun
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5220 - 5231