CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引:4
|
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Zhao, Weisheng [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China
关键词
Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;
D O I
10.1109/TCAD.2023.3298705
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 50 条
  • [41] Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
    Xiao-Bing Chen
    Hao Qi
    Shao-Hui Peng
    Yi-Min Zhuang
    Tian Zhi
    Yun-Ji Chen
    Journal of Computer Science and Technology, 2022, 37 (6) : 1255 - 1270
  • [42] Parallel Convolutional Neural Network (CNN) Accelerators Based on Stochastic Computing
    Zhang, Yawen
    Zhang, Xinyue
    Song, Jiahao
    Wang, Yuan
    Huang, Ru
    Wang, Runsheng
    PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 19 - 24
  • [43] IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators
    Wadhwa, Eashan
    Khandelwal, Shashwat
    Shreejith, Shanker
    2022 IEEE 33RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2022, : 115 - 121
  • [44] Vessel Identification using Convolutional Neural Network-based Hardware Accelerators
    Boyer, Alexandre
    Abiemona, Rami
    Bolic, Miodrag
    Petriu, Emil
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND VIRTUAL ENVIRONMENTS FOR MEASUREMENT SYSTEMS AND APPLICATIONS (IEEE CIVEMSA 2021), 2021,
  • [45] DSE-Based Hardware Trojan Attack for Neural Network Accelerators on FPGAs
    Guo, Chao
    Yanagisawa, Masao
    Shi, Youhua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [46] Hardware-Efficient Residual Neural Network Execution in Line-Buffer Depth-First Processing
    Shi, Man
    Houshmand, Pouya
    Mei, Linyan
    Verhelst, Marian
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 690 - 700
  • [47] CMN: a co-designed neural architecture search for efficient computing-in-memory-based mixture-of-experts (vol 67, 200405, 2024)
    Han, Shihao
    Liu, Sishuo
    Du, Shucheng
    Li, Mingzi
    Ye, Zijian
    Xu, Xiaoxin
    Li, Yi
    Wang, Zhongrui
    Shang, Dashan
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (11)
  • [48] Improved Deep Neural Network hardware-accelerators based on Non-Volatile-Memory: the Local Gains technique
    Boybat, Irem
    di Nolfo, Carmelo
    Ambrogio, Stefano
    Bodini, Martina
    Farinha, Nathan C. P.
    Shelby, Robert M.
    Narayanan, Pritish
    Sidler, Severin
    Tsai, Hsinyu
    Leblebici, Yusuf
    Burr, Geoffrey W.
    2017 IEEE INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC), 2017, : 52 - 59
  • [49] PIM-HLS: An Automatic Hardware Generation Tool for Heterogeneous Processing-In-Memory-based Neural Network Accelerators
    Zhu, Yu
    Zhu, Zhenhua
    Dai, Guohao
    Tu, Fengbin
    Sun, Hanbo
    Cheng, Kwang-Ting
    Yang, Huazhong
    Wang, Yu
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [50] BFP-CIM: Data-Free Quantization with Dynamic Block-Floating-Point Arithmetic for Energy-Efficient Computing-In-Memory-based Accelerator
    Chang, Cheng-Yang
    Huang, Chi-Tse
    Chuang, Yu-Chuan
    Chou, Kuang-Chao
    Wu, An-Yeu
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 545 - 550