An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective

被引:13
|
作者
Chen, Qinyu [1 ]
Huang, Yan [1 ]
Sun, Rui [1 ]
Song, Wenqing [1 ]
Lu, Zhonghai [2 ]
Fu, Yuxiang [1 ]
Li, Li [1 ]
机构
[1] Nanjing Univ, Sch Elect & Engn, Nanjing 210000, Peoples R China
[2] KTH Royal Inst Technol, S-11428 Stockholm, Sweden
关键词
Data processing; Computer architecture; Very large scale integration; Hardware; Registers; Microsoft Windows; Kernel; Dilated convolutions (DCONVs) and transposed convolutions (TCONVs); load balance; sparsity; VLSI; ARCHITECTURE;
D O I
10.1109/TVLSI.2020.2976454
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) have emerged as one of the most popular ways applied in many fields. These networks deliver better performance when going deeper and larger. However, the complicated computation and huge storage impede hardware implementation. To address the problem, quantized networks are proposed. Besides, various convolutional structures are designed to meet the requirements of different applications. For example, compared with the traditional convolutions (CONVs) for image classification, CONVs for image generation are usually composed of traditional CONVs, dilated CONVs, and transposed CONVs, leading to a difficult hardware mapping problem. In this brief, we translate the difficult mapping problem into the sparsity problem and propose an efficient hardware architecture for sparse binary and ternary CNNs by exploiting the sparsity and low bit-width characteristics. To this end, we propose an ineffectual data removing (IDR) mechanism to remove both the regular and irregular sparsity based on dual-channel processing elements (PEs). Besides, a flexible layered load balance (LLB) mechanism is introduced to alleviate the load imbalance. The accelerator is implemented with 65-nm technology with a core size of 2.56 mm(2). It can achieve 3.72-TOPS/W energy efficiency at 50.1 mW, which makes it a promising design for embedded devices.
引用
收藏
页码:1540 / 1544
页数:5
相关论文
共 50 条
  • [1] Runtime Reconfigurable Hardware Accelerator for Energy-Efficient Transposed Convolutions
    Marrazzo, Emanuel
    Spagnolo, Fanny
    Perri, Stefania
    PRIME 2022: 17TH INTERNATIONAL CONFERENCE ON PHD RESEARCH IN MICROELECTRONICS AND ELECTRONICS, 2022, : 49 - 52
  • [2] An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity
    Xu, Mingyang
    Lu, Jinming
    Wang, Zhongfeng
    Lin, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 230 - 233
  • [3] Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference
    Verelst, Thomas
    Tuytelaars, Tinne
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2317 - 2326
  • [4] AdaS: A Fast and Energy-Efficient CNN Accelerator Exploiting Bit-Sparsity
    Lin, Xiaolong
    Li, Gang
    Liu, Zizhao
    Liu, Yadong
    Zhang, Fan
    Song, Zhuoran
    Jing, Naifeng
    Liang, Xiaoyao
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [5] SparseNN: An Energy-Efficient Neural Network Accelerator Exploiting Input and Output Sparsity
    Zhu, Jingyang
    Jiang, Jingbo
    Chen, Xizi
    Tsui, Chi-Ying
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 241 - 244
  • [6] EFFICIENT MULTIPLE SOLUTIONS FOR CHANGES IN A NETWORK USING SPARSITY TECHNIQUES
    BRAMELLER, A
    PROCEEDINGS OF THE INSTITUTION OF ELECTRICAL ENGINEERS-LONDON, 1973, 120 (05): : 607 - 608
  • [7] A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks
    Haiying Yuan
    Zhiyong Zeng
    Junpeng Cheng
    Minghao Li
    Circuits, Systems, and Signal Processing, 2022, 41 : 4370 - 4389
  • [8] Selective Pruning of Sparsity-Supported Energy-Efficient Accelerator for Convolutional Neural Networks
    Liu, Chia-Chi
    Zhang, Xuezhi
    Wey, I-Chyn
    Teo, T. Hui
    2023 IEEE 16TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP, MCSOC, 2023, : 454 - 461
  • [9] An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity
    Dong, Qiwei
    Zhang, Siyu
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025,
  • [10] A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks
    Yuan, Haiying
    Zeng, Zhiyong
    Cheng, Junpeng
    Li, Minghao
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4370 - 4389