CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引：4

作者：

Bai, Jinyu ^{[1
]}

Sun, Sifan ^{[1
]}

Zhao, Weisheng ^{[1
]}

Kang, Wang ^{[1
]}

机构：

[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 01期

关键词：

Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;

D O I：

10.1109/TCAD.2023.3298705

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.

引用

页码：189 / 202

页数：14

共 50 条

[31] An efficient loop tiling framework for convolutional neural network inference accelerators
Huang, Hongmin
Hu, Xianghong
Li, Xueming
Xiong, Xiaoming
IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123
[32] Enhancement of Convolutional Neural Network Hardware Accelerators Efficiency Using Sparsity Optimization Framework
Kurapati, Hemalatha
Ramachandran, Sakthivel
IEEE ACCESS, 2024, 12 : 86034 - 86042
[33] Hardware Accelerators for Spiking Neural Networks for Energy-Efficient Edge Computing (Extended Abstract)
Moitra, Abhishek
Yin, Ruokai
Panda, Priyadarshini
PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 137 - 138
[34] Design Tradeoff of Internal Memory Size and Memory Access Energy in Deep Neural Network Hardware Accelerators
Hsiao, Shen-Fu
Wu, Pei-Hsuen
2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 735 - 736
[35] A Real-Time and Hardware-Efficient Processor for Skeleton-Based Action Recognition With Lightweight Convolutional Neural Network
Zhang, Bingyi
Han, Jun
Huang, Zhize
Yang, Jianwei
Zeng, Xiaoyang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (12) : 2052 - 2056
[36] Analog-memory-based In-Memory-Computing Accelerators for Deep Neural Networks
Tsai, Hsinyu
2024 IEEE WORKSHOP ON MICROELECTRONICS AND ELECTRON DEVICES, WMED, 2024, : XIII - XIII
[37] A Hardware/Software Framework for the Integration of FPGA-based Accelerators into Cloud Computing Infrastructures
Steinert, Fritjof
Kreowsky, Philipp
Wisotzky, Eric L.
Unger, Christian
Stabernack, Benno
2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 23 - 28
[38] On Designing Efficient and Reliable Nonvolatile Memory-Based Computing-In-Memory Accelerators
Yan, Bonan
Liu, Mengyun
Chen, Yiran
Chakrabarty, Krishnendu
Li, Hai
2019 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2019,
[39] A Secure Computing System With Hardware-Efficient Lazy Bonsai Merkle Tree for FPGA-Attached Embedded Memory
Shadab, Rakin Muhammad
Zou, Yu
Gandham, Sanjay
Awad, Amro
Lin, Mingjie
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 3262 - 3279
[40] Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
Chen, Xiao-Bing
Qi, Hao
Peng, Shao-Hui
Zhuang, Yi-Min
Zhi, Tian
Chen, Yun-Ji
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (06) : 1255 - 1270

← 1 2 3 4 5 →