An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

被引:13
|
作者
Chen, Qinyu [1 ]
Huang, Yan [1 ]
Sun, Rui [1 ]
Song, Wenqing [1 ]
Lu, Zhonghai [2 ]
Fu, Yuxiang [1 ]
Li, Li [1 ]
机构
[1] Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China
[2] KTH Royal Inst Technol, S-11428 Stockholm, Sweden
关键词
Data processing; Computer architecture; Very large scale integration; Hardware; Registers; Microsoft Windows; Kernel; Dilated convolutions (DCONVs) and transposed convolutions (TCONVs); load balance; sparsity; VLSI; ARCHITECTURE;
D O I
10.1109/TVLSI.2020.2976454
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm(2). It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.
引用
收藏
页码:1540 / 1544
页数:5
相关论文
共 50 条
  • [31] Perspective-Adaptive Convolutions for Scene Parsing
    Zhang, Rui
    Tang, Sheng
    Zhang, Yongdong
    Li, Jintao
    Yan, Shuicheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (04) : 909 - 924
  • [32] FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element -Wise Sparsity and without External Memory Access
    Meng, Jian
    Venkataramanaiah, Shreyas Kolala
    Zhou, Chuteng
    Hansen, Patrick
    Whatmough, Paul
    Seo, Jae-sun
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 9 - 16
  • [33] EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning
    Kim, Junsoo
    Han, Seunghee
    Ko, Geonwoo
    Kim, Ji-Hoon
    Lee, Changha
    Kim, Taewoo
    Youn, Chan-Hyun
    Kim, Joo-Young
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (03) : 830 - 841
  • [34] SPACE: Sparsity Propagation Based DCNN Training Accelerator on Edge
    Wang, Miao
    Chen, Zhen
    Li, Chuxi
    Yang, Zhao
    Li, Lei
    Zhang, Meng
    Zhang, Shengbing
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 391 - 405
  • [35] AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
    Tuli S.
    Jha N.K.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42 (11) : 4038 - 4051
  • [36] Structural Sparsity in Multiple Measurements
    Bossmann, F.
    Krause-Solberg, S.
    Maly, J.
    Sissouno, N.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 280 - 291
  • [37] SPARSITY IN MULTIPLE KERNEL LEARNING
    Koltchinskii, Vladimir
    Yuan, Ming
    ANNALS OF STATISTICS, 2010, 38 (06): : 3660 - 3695
  • [38] Skip-Convolutions for Efficient Video Processing
    Habibian, Amirhossein
    Abati, Davide
    Cohen, Taco S.
    Bejnordi, Babak Ehteshami
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2694 - 2703
  • [39] FIR perfect signal reconstruction from multiple convolutions: Minimum deconvolver orders
    Harikumar, G
    Bresler, Y
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (01) : 215 - 218
  • [40] AGGREGATED DILATED CONVOLUTIONS FOR EFFICIENT MOTION DEBLURRING
    Miao, Hong
    Zhang, Wenqiang
    Bai, Jiansong
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,