CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引:4
|
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Zhao, Weisheng [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China
关键词
Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;
D O I
10.1109/TCAD.2023.3298705
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 50 条
  • [21] DQI: A Dynamic Quantization Method for Efficient Convolutional Neural Network Inference Accelerators
    Wang, Yun
    Liu, Qiang
    Yan, Shun
    2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 231 - 231
  • [22] Low-power hardware-efficient memory-based DCT processor
    AbdolVahab Khalili Sadaghiani
    Behjat Forouzandeh
    Journal of Real-Time Image Processing, 2022, 19 : 1105 - 1121
  • [23] Low-power hardware-efficient memory-based DCT processor
    Sadaghiani, AbdolVahab Khalili
    Forouzandeh, Behjat
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (06) : 1105 - 1121
  • [24] Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
    Sasikumar, A.
    Ravi, Logesh
    Kotecha, Ketan
    Indragandhi, V
    Subramaniyaswamy, V
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 102
  • [25] Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
    A, Sasikumar
    Ravi, Logesh
    Kotecha, Ketan
    V, Indragandhi
    V, Subramaniyaswamy
    Computers and Electrical Engineering, 2022, 102
  • [26] Data-Mining-Based Hardware-Efficient Neural Network Controller for DC-DC Switching Converters
    Liu, Jianfu
    Wei, Tingcun
    Chen, Nan
    Liu, Wei
    Wu, Jiayu
    Xiao, Peilei
    IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2023, 11 (04) : 4222 - 4232
  • [27] A Hardware-Efficient EMG Decoder with an Attractor-based Neural Network for Next-Generation Hand Prostheses
    Kalbasi, Mohammad
    Shaeri, MohammadAli
    Mendez, Vincent Alexandre
    Shokur, Solaiman
    Micera, Silvestro
    Shoaran, Mahsa
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 532 - 536
  • [28] Pulse-based Feature Extraction for Hardware-efficient Neural Recording Systems
    Bhaduri, Aritra
    Yao, Enyi
    Basu, Arindam
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 1842 - 1845
  • [29] DE-C3: Dynamic Energy-Aware Compression for Computing-In-Memory-Based Convolutional Neural Network Acceleration
    Wu, Guan-Wei
    Chang, Cheng-Yang
    Wu, An-Yeu
    2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 90 - 95
  • [30] Memory Efficient Training using Lookup-Table-based Quantization for Neural Network
    Onishi, Kazuki
    Yu, Jaehoon
    Hashimoto, Masanori
    2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 251 - 255