CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引:4
|
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Zhao, Weisheng [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China
关键词
Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;
D O I
10.1109/TCAD.2023.3298705
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 50 条
  • [1] Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator
    Bai, Jinyu
    Xue, Wenlu
    Fan, Yunqian
    Sun, Sifan
    Kang, Wang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) : 3049 - 3053
  • [2] Computing with Biophysical and Hardware-Efficient Neural Models
    Selyunin, Konstantin
    Hasani, Ramin M.
    Ratasich, Denise
    Bartocci, Ezio
    Grosu, Radu
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I, 2017, 10305 : 535 - 547
  • [3] SOMALib : Library of Exact and Approximate Activation Functions for Hardware-efficient Neural Network Accelerators
    Prashanth, H. C.
    Rao, Madhav
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 746 - 753
  • [4] A hardware-efficient computing engine for FPGA-based deep convolutional neural network accelerator
    Li, Xueming
    Huang, Hongmin
    Chen, Taosheng
    Gao, Huaien
    Hu, Xianghong
    Xiong, Xiaoming
    MICROELECTRONICS JOURNAL, 2022, 128
  • [5] Hardware Efficient Reconfigurable Logic-in-Memory Circuit Based Neural Network Computing
    Liu, Tianchi
    Zhou, Yizhuo
    Zhou, Yakun
    Chai, Zheng
    Chen, Jienan
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [6] Memory Requirements for Convolutional Neural Network Hardware Accelerators
    Siu, Kevin
    Stuart, Dylan Malone
    Mahmoud, Mostafa
    Moshovos, Andreas
    2018 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2018, : 111 - 121
  • [7] Extreme Partial-Sum Quantization for Analog Computing-In-Memory Neural Network Accelerators
    Kim, Yulhwa
    Kim, Hyungjun
    Kim, Jae-Joon
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
  • [8] Hardware-Efficient Deconvolution-Based GAN for Edge Computing
    Alhussain, Azzam
    Lin, Mingjie
    2022 56TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2022, : 172 - 176
  • [9] Efficient Neural Network Accelerators with Optical Computing and Communication
    Xia, Chengpeng
    Chen, Yawen
    Zhang, Haibo
    Zhang, Hao
    Dai, Fei
    Wu, Jigang
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (01) : 513 - 535
  • [10] A Hardware-Efficient Sigmoid Function With Adjustable Precision for a Neural Network System
    Tsai, Chang-Hung
    Chih, Yu-Ting
    Wong, Wing Hung
    Lee, Chen-Yi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2015, 62 (11) : 1073 - 1077