CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引：4

作者：

Bai, Jinyu ^{[1
]}

Sun, Sifan ^{[1
]}

Zhao, Weisheng ^{[1
]}

Kang, Wang ^{[1
]}

机构：

[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 01期

关键词：

Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;

D O I：

10.1109/TCAD.2023.3298705

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.

引用

页码：189 / 202

页数：14

共 50 条

[41] Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
Xiao-Bing Chen
Hao Qi
Shao-Hui Peng
Yi-Min Zhuang
Tian Zhi
Yun-Ji Chen
Journal of Computer Science and Technology, 2022, 37 (6) : 1255 - 1270
[42] Parallel Convolutional Neural Network (CNN) Accelerators Based on Stochastic Computing
Zhang, Yawen
Zhang, Xinyue
Song, Jiahao
Wang, Yuan
Huang, Ru
Wang, Runsheng
PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 19 - 24
[43] IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators
Wadhwa, Eashan
Khandelwal, Shashwat
Shreejith, Shanker
2022 IEEE 33RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2022, : 115 - 121
[44] Vessel Identification using Convolutional Neural Network-based Hardware Accelerators
Boyer, Alexandre
Abiemona, Rami
Bolic, Miodrag
Petriu, Emil
2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (IEEE CIVEMSA 2021), 2021,
[45] DSE-Based Hardware Trojan Attack for Neural Network Accelerators on FPGAs
Guo, Chao
Yanagisawa, Masao
Shi, Youhua
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[46] Hardware-Efficient Residual Neural Network Execution in Line-Buffer Depth-First Processing
Shi, Man
Houshmand, Pouya
Mei, Linyan
Verhelst, Marian
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 690 - 700
[47] CMN: a co-designed neural architecture search for efficient computing-in-memory-based mixture-of-experts (vol 67, 200405, 2024)
Han, Shihao
Liu, Sishuo
Du, Shucheng
Li, Mingzi
Ye, Zijian
Xu, Xiaoxin
Li, Yi
Wang, Zhongrui
Shang, Dashan
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (11)
[48] Improved Deep Neural Network hardware-accelerators based on Non-Volatile-Memory: the Local Gains technique
Boybat, Irem
di Nolfo, Carmelo
Ambrogio, Stefano
Bodini, Martina
Farinha, Nathan C. P.
Shelby, Robert M.
Narayanan, Pritish
Sidler, Severin
Tsai, Hsinyu
Leblebici, Yusuf
Burr, Geoffrey W.
2017 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2017, : 52 - 59
[49] PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators
Zhu, Yu
Zhu, Zhenhua
Dai, Guohao
Tu, Fengbin
Sun, Hanbo
Cheng, Kwang-Ting
Yang, Huazhong
Wang, Yu
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[50] BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator
Chang, Cheng-Yang
Huang, Chi-Tse
Chuang, Yu-Chuan
Chou, Kuang-Chao
Wu, An-Yeu
29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 545 - 550

← 1 2 3 4 5 →