Exploring Bit-Level Sparsity for Partial Sum Quantization in Computing-In-Memory Accelerator

被引:1
|
作者
Bai, Jinyu [1 ]
Sun, Sifan [1 ]
Kang, Wang [1 ]
机构
[1] Beihang Univ, Sch Integrated Circuit Sci & Engn, Beijing, Peoples R China
关键词
Computing-In-Memory (CIM); partial sum quantization (PSQ); bit-level sparsity; post-training quantization (PTQ);
D O I
10.1109/NVMSA58981.2023.00021
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Computing-In-Memory (CIM) has demonstrated great potential in boosting the performance and energy efficiency of convolutional neural networks. However, due to the limited size and precision of its memory array, the input and weight matrices of convolution operations have to be split into sub-matrices or even binary sub-matrices, especially when using bit-slicing and single-level cells (SLCs). A large number of partial sums are generated as a result. To maintain high computing precision, high-resolution analog-to-digital converters (ADCs) are used to obtain partial sums at the cost of considerable area and substantial energy overhead. Partial sum quantization (PSQ), a technique that can greatly reduce the resolution of ADC, remains sparsely studied. This paper proposes a novel PSQ approach for CIM-based accelerators by exploring the bit-level sparsity of neural networks. Then, to find the optimal clipping threshold for ADCs, a reparametrized clipping function is also proposed. Finally, we develop a general post-training quantization framework for the PSQ-CIM. Experiments on a variety of neural networks and datasets show that, in typical case (ResNet18 for ImageNet), the required resolution of ADC can be reduced to 2 bits with little accuracy loss (similar to 0.92%) and the hardware efficiency can be improved by 199.7%.
引用
收藏
页码:32 / 37
页数:6
相关论文
共 50 条
  • [1] Extreme Partial-Sum Quantization for Analog Computing-In-Memory Neural Network Accelerators
    Kim, Yulhwa
    Kim, Hyungjun
    Kim, Jae-Joon
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
  • [2] Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator
    Bai, Jinyu
    Xue, Wenlu
    Fan, Yunqian
    Sun, Sifan
    Kang, Wang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) : 3049 - 3053
  • [3] Saturation RRAM Leveraging Bit-level Sparsity Resulting from Term Quantization
    McDanel, Bradley
    Zhang, Sai Qian
    Kung, H. T.
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [4] Bit-Transformer: Transforming Bit-level Sparsity into Higher Preformance in ReRAM-based Accelerator
    Liu, Fangxin
    Zhao, Wenbo
    He, Zhezhi
    Wang, Zongwu
    Zhao, Yilong
    Chen, Yongbiao
    Jiang, Li
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [5] A multiplier-Free RNS-Based CNN accelerator exploiting bit-Level sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    2023 IEEE 30TH SYMPOSIUM ON COMPUTER ARITHMETIC, ARITH 2023, 2023, : 101 - 101
  • [6] A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (02) : 667 - 683
  • [7] Towards CIM-friendly and Energy-Efficient DNN Accelerator via Bit-level Sparsity
    Karimzadeh, Foroozan
    Raychowdhury, Arijit
    PROCEEDINGS OF THE 2022 IFIP/IEEE 30TH INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2022,
  • [8] Effect of Bit-Level Correlation In Stochastic Computing
    Parhi, Megha
    Riedel, Marc D.
    Parhi, Keshab K.
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 463 - 467
  • [9] A 5T-SRAM Based Computing-In-Memory Macro Featuring Partial Sum Boosting and Analog Non-Uniform Quantization
    Xin, Guoqiang
    Tan, Fei
    Li, Junde
    Chen, Junren
    Yu, Wei-Han
    Un, Ka-Fai
    Martins, Rui P.
    Mak, Pui-In
    2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 882 - 887
  • [10] An 11T1C Bit-Level-Sparsity-Aware Computing-in-Memory Macro With Adaptive Conversion Time and Computation Voltage
    Lin, Ye
    Li, Yuandong
    Zhang, Heng
    Ma, He
    Lv, Jingjing
    Jiang, Anying
    Du, Yuan
    Du, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 4985 - 4995