CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引：4

作者：

Bai, Jinyu ^{[1
]}

Sun, Sifan ^{[1
]}

Zhao, Weisheng ^{[1
]}

Kang, Wang ^{[1
]}

机构：

[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 01期

关键词：

Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;

D O I：

10.1109/TCAD.2023.3298705

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.

引用

页码：189 / 202

页数：14

共 50 条

[21] DQI: A Dynamic Quantization Method for Efficient Convolutional Neural Network Inference Accelerators
Wang, Yun
Liu, Qiang
Yan, Shun
2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 231 - 231
[22] Low-power hardware-efficient memory-based DCT processor
AbdolVahab Khalili Sadaghiani
Behjat Forouzandeh
Journal of Real-Time Image Processing, 2022, 19 : 1105 - 1121
[23] Low-power hardware-efficient memory-based DCT processor
Sadaghiani, AbdolVahab Khalili
Forouzandeh, Behjat
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (06) : 1105 - 1121
[24] Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
Sasikumar, A.
Ravi, Logesh
Kotecha, Ketan
Indragandhi, V
Subramaniyaswamy, V
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 102
[25] Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
A, Sasikumar
Ravi, Logesh
Kotecha, Ketan
V, Indragandhi
V, Subramaniyaswamy
Computers and Electrical Engineering, 2022, 102
[26] Data-Mining-Based Hardware-Efficient Neural Network Controller for DC-DC Switching Converters
Liu, Jianfu
Wei, Tingcun
Chen, Nan
Liu, Wei
Wu, Jiayu
Xiao, Peilei
IEEE JOURNAL OF EMERGING AND SELECTED TOPICS IN POWER ELECTRONICS, 2023, 11 (04) : 4222 - 4232
[27] A Hardware-Efficient EMG Decoder with an Attractor-based Neural Network for Next-Generation Hand Prostheses
Kalbasi, Mohammad
Shaeri, MohammadAli
Mendez, Vincent Alexandre
Shokur, Solaiman
Micera, Silvestro
Shoaran, Mahsa
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 532 - 536
[28] Pulse-based Feature Extraction for Hardware-efficient Neural Recording Systems
Bhaduri, Aritra
Yao, Enyi
Basu, Arindam
2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 1842 - 1845
[29] DE-C3: Dynamic Energy-Aware Compression for Computing-In-Memory-Based Convolutional Neural Network Acceleration
Wu, Guan-Wei
Chang, Cheng-Yang
Wu, An-Yeu
2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 90 - 95
[30] Memory Efficient Training using Lookup-Table-based Quantization for Neural Network
Onishi, Kazuki
Yu, Jaehoon
Hashimoto, Masanori
2020 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2020), 2020, : 251 - 255

← 1 2 3 4 5 →