CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引：4

作者：

Bai, Jinyu ^{[1
]}

Sun, Sifan ^{[1
]}

Zhao, Weisheng ^{[1
]}

Kang, Wang ^{[1
]}

机构：

[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 01期

关键词：

Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;

D O I：

10.1109/TCAD.2023.3298705

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.

引用

页码：189 / 202

页数：14

共 50 条

[1] Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator
Bai, Jinyu
Xue, Wenlu
Fan, Yunqian
Sun, Sifan
Kang, Wang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) : 3049 - 3053
[2] Computing with Biophysical and Hardware-Efficient Neural Models
Selyunin, Konstantin
Hasani, Ramin M.
Ratasich, Denise
Bartocci, Ezio
Grosu, Radu
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I, 2017, 10305 : 535 - 547
[3] SOMALib : Library of Exact and Approximate Activation Functions for Hardware-efficient Neural Network Accelerators
Prashanth, H. C.
Rao, Madhav
2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 746 - 753
[4] A hardware-efficient computing engine for FPGA-based deep convolutional neural network accelerator
Li, Xueming
Huang, Hongmin
Chen, Taosheng
Gao, Huaien
Hu, Xianghong
Xiong, Xiaoming
MICROELECTRONICS JOURNAL, 2022, 128
[5] Hardware Efficient Reconfigurable Logic-in-Memory Circuit Based Neural Network Computing
Liu, Tianchi
Zhou, Yizhuo
Zhou, Yakun
Chai, Zheng
Chen, Jienan
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[6] Memory Requirements for Convolutional Neural Network Hardware Accelerators
Siu, Kevin
Stuart, Dylan Malone
Mahmoud, Mostafa
Moshovos, Andreas
2018 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2018, : 111 - 121
[7] Extreme Partial-Sum Quantization for Analog Computing-In-Memory Neural Network Accelerators
Kim, Yulhwa
Kim, Hyungjun
Kim, Jae-Joon
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
[8] Hardware-Efficient Deconvolution-Based GAN for Edge Computing
Alhussain, Azzam
Lin, Mingjie
2022 56TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2022, : 172 - 176
[9] Efficient Neural Network Accelerators with Optical Computing and Communication
Xia, Chengpeng
Chen, Yawen
Zhang, Haibo
Zhang, Hao
Dai, Fei
Wu, Jigang
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (01) : 513 - 535
[10] A Hardware-Efficient Sigmoid Function With Adjustable Precision for a Neural Network System
Tsai, Chang-Hung
Chih, Yu-Ting
Wong, Wing Hung
Lee, Chen-Yi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2015, 62 (11) : 1073 - 1077

← 1 2 3 4 5 →