An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

被引:66
|
作者
Liu, Bing [1 ]
Zou, Danyin [1 ]
Feng, Lei [1 ]
Feng, Shou [1 ]
Fu, Ping [1 ]
Li, Junbao [1 ]
机构
[1] Harbin Inst Technol, Sch Elect & Informat Engn, Harbin 150001, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
convolutional neural network (CNN); field programmable gate array (FPGA); depthwise separable convolution; accelerator; COPROCESSOR;
D O I
10.3390/electronics8030281
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA's extremely limited resources and CNN's huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Advantages and limitations of fully on-chip CNN FPGA-based hardware accelerator
    Dinelli, Gianmarco
    Meoni, Gabriele
    Rapuano, Emilio
    Fanucci, Luca
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [32] Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
    Ayat, Sayed Omid
    Khalil-Hani, Mohamed
    Ab Rahman, Ab Al-Hadi
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2018, 26 (02) : 919 - 935
  • [33] Optimizing depthwise separable convolution on DCUOptimizing depthwise separable convolution on DCU...Z. Liu et al.
    Zheng Liu
    Meng Hao
    Weizhe Zhang
    Gangzhao Lu
    Xueyang Tian
    Siyu Yang
    Mingdong Xie
    Jie Dai
    Chenyu Yuan
    Desheng Wang
    Hongwei Yang
    CCF Transactions on High Performance Computing, 2024, 6 (6) : 646 - 664
  • [34] Optimizing Depthwise Separable Convolution Operations on GPUs
    Lu, Gangzhao
    Zhang, Weizhe
    Wang, Zheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 70 - 87
  • [35] Fast Depthwise Separable Convolution for Embedded Systems
    Yoo, Byeongheon
    Choi, Yongjun
    Choi, Heeyoul
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VII, 2018, 11307 : 656 - 665
  • [36] Recognition method for adhesive fish based on depthwise separable convolution network
    Zhang L.
    Li D.
    Cao X.
    Li W.
    Tian G.
    Duan Q.
    Duan, Qingling (dqling@cau.edu.cn); Duan, Qingling (dqling@cau.edu.cn); Duan, Qingling (dqling@cau.edu.cn), 1600, Chinese Society of Agricultural Engineering (37): : 160 - 167
  • [37] Traffic flow prediction based on depthwise separable convolution fusion network
    Yue Yu
    Wei Sun
    Jianhua Liu
    Changfan Zhang
    Journal of Big Data, 9
  • [38] Depthwise Separable Convolution based Lightweight HSRRS Image Classification Method
    Luo, Wang
    Li, Tong
    Yang, Weidong
    Yu, Tongwei
    Xi, Dingding
    Shen, Li
    Xia, Yuan
    Yang, Zhibin
    Xu, Huarong
    2020 12TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2020, : 586 - 590
  • [39] Improving the computational efficiency and flexibility of FPGA-based CNN accelerator through loop optimization
    Liu, Yuhao
    Ma, Yanhua
    Zhang, Bowei
    Liu, Lu
    Wang, Jie
    Tang, Shibo
    MICROELECTRONICS JOURNAL, 2024, 147
  • [40] An FPGA-Based accelerator for multiphysics modeling
    Huang, XM
    Ma, J
    ERSA '04: THE 2004 INTERNATIONAL CONFERENCE ON ENGINEERING OF RECONFIGURABLE SYSTEMS AND ALGORITHMS, 2004, : 209 - 212