An Energy-Efficient and Area-Efficient Depthwise Separable Convolution Accelerator with Minimal On-Chip Memory Access

被引:0
|
作者
Chen, Yi [1 ]
Lou, Jie [1 ]
Lanius, Christian [1 ]
Freye, Florian [1 ]
Loh, Johnson [1 ]
Gemmeke, Tobias [1 ]
机构
[1] Rhein Westfal TH Aachen, Chair Integrated Digital Syst & Circuit Design, Aachen, Germany
关键词
Depthwise separable convolution; hardware accelerator; PE utilization; energy-efficient design; area-efficient design; memory access;
D O I
10.1109/VLSI-SoC57769.2023.10321918
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Depthwise separable convolution (DSC) has emerged as a crucial building block for developing lightweight convolutional neural networks (CNNs). In this paper, we present a hardware accelerator for DSC that enables 100% utilization of the processing element (PE) array for depthwise convolution (DWC) and achieves up to 98% utilization for pointwise convolution (PWC), while also reducing latency. By partitioning the input feature map (ifmap) SRAM of the DWC into three banks, we minimize memory access and maximize data reuse. The input activations and weights only need to be loaded once from SRAM to PE for both DWC and PWC. Additionally, to support efficient operations across different layers, we present a layerwise matching method. The proposed DSC accelerator is implemented in 22nm FDSOI technology and validated using MobileNetV1 on the CIFAR10 dataset. The post-layout results demonstrate that the proposed accelerator can operate at 1GHz and achieve an energy efficiency of 5.07 (3.96) TOPS/W and an area efficiency of 519.2 (461.52) GOPS/mm(2) for DWC (PWC) at 0.8V. After scaling the supply voltage down to 0.5V, the energy efficiency for the proposed accelerator increases to 13.64 TOPS/W for DWC and 10.64 TOPS/W for PWC, respectively.
引用
收藏
页码:50 / 55
页数:6
相关论文
共 50 条
  • [1] An FPGA-Based Energy-Efficient Reconfigurable Depthwise Separable Convolution Accelerator for Image Recognition
    Xuan, Lei
    Un, Ka-Fai
    Lam, Chi-Seng
    Martins, Rui P.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4003 - 4007
  • [2] An Energy-efficient Convolution Unit for Depthwise Separable Convolutional Neural Networks
    Chong, Yi Sheng
    Goh, Wang Ling
    Ong, Yew Soon
    Nambiar, Vishnu P.
    Do, Anh Tuan
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [3] A digital signal processor-efficient accelerator for depthwise separable convolution
    Li, Xueming
    Huang, Hongmin
    Liu, Yuan
    Hu, Xianghong
    Xiong, Xiaoming
    [J]. ELECTRONICS LETTERS, 2022, 58 (07) : 271 - 273
  • [4] Efficient depthwise separable convolution accelerator for classification and UAV object detection
    Li, Guoqing
    Zhang, Jingwei
    Zhang, Meng
    Wu, Ruixia
    Cao, Xinye
    Liu, Wenzhao
    [J]. NEUROCOMPUTING, 2022, 490 : 1 - 16
  • [5] RasP: An area-efficient, on-chip network
    Hollis, Simon
    Moore, Simon W.
    [J]. PROCEEDINGS 2006 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2007, : 63 - 69
  • [6] Area-Efficient Two-Dimensional Separable Convolution Structure
    Kim, Hyeonkyu
    Yoo, Hoyoung
    [J]. JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY, 2019, 63 (05)
  • [7] MCAIMem: A Mixed SRAM and eDRAM Cell for Area and Energy-Efficient On-Chip AI Memory
    Nguyen, Duy-Thanh
    Bhattacharjee, Abhiroop
    Moitra, Abhishek
    Panda, Priyadarshini
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024,
  • [8] An Energy-Efficient GAN Accelerator with On-chip Training for Domain Specific Optimization
    Kim, Soyeon
    Kang, Sanghoon
    Han, Donghyeon
    Kim, Sangyeob
    Kim, Sangjin
    Yoo, Hoi-jun
    [J]. 2020 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC), 2020,
  • [9] Area and energy-efficient crosstalk avoidance codes for on-chip buses
    Sridhara, SR
    Ahmed, A
    Shanbhag, NR
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2004, : 12 - 17
  • [10] The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution
    Lin, Hung-Ju
    Shen, Chung-An
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (06) : 3547 - 3569