CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators

被引:4
|
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Zhao, Weisheng [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing 100191, Peoples R China
关键词
Quantization (signal); Hardware; Artificial neural networks; Common Information Model (computing); Training; Memory management; Computational efficiency; Bit-level sparsity; computing-in-memory (CIM); neural network quantization; post-training quantization (PTQ); quantization granularity; reparametrization;
D O I
10.1109/TCAD.2023.3298705
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The novel computing-in-memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This article proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM-based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights, and outputs (or activations, weights, and partial sums in NNs) with four innovative techniques. First, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Second, inspired by the unique computation paradigm of CIM, an innovative arraywise quantization granularity is proposed for weight quantization. Third, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, and ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.
引用
收藏
页码:189 / 202
页数:14
相关论文
共 50 条
  • [31] An efficient loop tiling framework for convolutional neural network inference accelerators
    Huang, Hongmin
    Hu, Xianghong
    Li, Xueming
    Xiong, Xiaoming
    IET CIRCUITS DEVICES & SYSTEMS, 2022, 16 (01) : 116 - 123
  • [32] Enhancement of Convolutional Neural Network Hardware Accelerators Efficiency Using Sparsity Optimization Framework
    Kurapati, Hemalatha
    Ramachandran, Sakthivel
    IEEE ACCESS, 2024, 12 : 86034 - 86042
  • [33] Hardware Accelerators for Spiking Neural Networks for Energy-Efficient Edge Computing (Extended Abstract)
    Moitra, Abhishek
    Yin, Ruokai
    Panda, Priyadarshini
    PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 137 - 138
  • [34] Design Tradeoff of Internal Memory Size and Memory Access Energy in Deep Neural Network Hardware Accelerators
    Hsiao, Shen-Fu
    Wu, Pei-Hsuen
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 735 - 736
  • [35] A Real-Time and Hardware-Efficient Processor for Skeleton-Based Action Recognition With Lightweight Convolutional Neural Network
    Zhang, Bingyi
    Han, Jun
    Huang, Zhize
    Yang, Jianwei
    Zeng, Xiaoyang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (12) : 2052 - 2056
  • [36] Analog-memory-based In-Memory-Computing Accelerators for Deep Neural Networks
    Tsai, Hsinyu
    2024 IEEE WORKSHOP ON MICROELECTRONICS AND ELECTRON DEVICES, WMED, 2024, : XIII - XIII
  • [37] A Hardware/Software Framework for the Integration of FPGA-based Accelerators into Cloud Computing Infrastructures
    Steinert, Fritjof
    Kreowsky, Philipp
    Wisotzky, Eric L.
    Unger, Christian
    Stabernack, Benno
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 23 - 28
  • [38] On Designing Efficient and Reliable Nonvolatile Memory-Based Computing-In-Memory Accelerators
    Yan, Bonan
    Liu, Mengyun
    Chen, Yiran
    Chakrabarty, Krishnendu
    Li, Hai
    2019 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2019,
  • [39] A Secure Computing System With Hardware-Efficient Lazy Bonsai Merkle Tree for FPGA-Attached Embedded Memory
    Shadab, Rakin Muhammad
    Zou, Yu
    Gandham, Sanjay
    Awad, Amro
    Lin, Mingjie
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 3262 - 3279
  • [40] Tetris: A Heuristic Static Memory Management Framework for Uniform Memory Multicore Neural Network Accelerators
    Chen, Xiao-Bing
    Qi, Hao
    Peng, Shao-Hui
    Zhuang, Yi-Min
    Zhi, Tian
    Chen, Yun-Ji
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (06) : 1255 - 1270