Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks

被引:0
|
作者
Geng, Xinkuang [1 ]
Liu, Siting [2 ]
Jiang, Jianfei [1 ]
Jiang, Kai [3 ]
Jiang, Honglan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Micronano Elect, Shanghai, Peoples R China
[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[3] Inspur Acad Sci & Technol, Jinan, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.23919/DATE58400.2024.10546652
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To reduce the demands for computation and memory of deep neural networks (DNNs), various quantization techniques have been extensively investigated. However, conventional methods cannot effectively capture the intrinsic data characteristics in DNNs, leading to a high accuracy degradation when employing low-bit-width quantization. In order to better align with the bell-shaped distribution, we propose an efficient non-uniform quantization scheme, denoted as compact powers-of-two (CPoT). Aiming to avoid the rigid resolution inherent in powers-of-two (PoT) without introducing new issues, we add a fractional part to its encoding, followed by a biasing operation to eliminate the unrepresentable region around 0. This approach effectively balances the grid resolution in both the vicinity of 0 and the edge region. To facilitate the hardware implementation, we optimize the dot product for CPoT based on the computational characteristics of the quantized DNNs, where the precomputable terms are extracted and incorporated into bias. Consequently, a multiply-accumulate (MAC) unit is designed for CPoT using shifters and look-up tables (LUTs). The experimental results show that, even with a certain level of approximation, our proposed CPoT outperforms state-of-the-art methods in data-free quantization (DFQ), a post-training quantization (PTQ) technique focusing on data privacy and computational efficiency. Furthermore, CPoT demonstrates superior efficiency in area and power compared to other methods in hardware implementation.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization
    Peric, Zoran
    Aleksic, Danijela
    Nikolic, Jelena
    Tomic, Stefan
    MATHEMATICS, 2022, 10 (19)
  • [32] Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers
    Zhou, Xichuan
    Duan, Yunmo
    Ding, Rui
    Wang, Qianchuan
    Wang, Qi
    Qin, Jian
    Liu, Haijun
    ELECTRONICS, 2023, 12 (24)
  • [33] On Practical Approach to Uniform Quantization of Non-redundant Neural Networks
    Goncharenko, Alexander
    Denisov, Andrey
    Alyamkin, Sergey
    Terentev, Evgeny
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II, 2019, 11728 : 349 - 360
  • [34] A Non-uniform Quantization Filter Based on Adaptive Quantization Interval in WSNs
    Wen, Chenglin
    Zhu, Chaoyang
    Xu, Daxing
    Quan, Lidi
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, ICCSIP 2016, 2017, 710 : 595 - 605
  • [35] Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference
    Pratap, Tej G. V. S. L.
    Kumar, Raja
    Pradeep, N. S.
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [36] A Non-Intrusive Load Monitoring Algorithm Based on Non-Uniform Sampling of Power Data and Deep Neural Networks
    Fagiani, Marco
    Bonfigli, Roberto
    Principi, Emanuele
    Squartini, Stefano
    Mandolini, Luigi
    ENERGIES, 2019, 12 (07)
  • [37] Efficient Simulation of Non-uniform Cellular Automata with a Convolutional Neural Network
    Rollier, Michiel
    Daly, Aisling J.
    Bruno, Odemir M.
    Baetens, Jan M.
    CELLULAR AUTOMATA, ACRI 2024, 2024, 14978 : 121 - 131
  • [38] On non-uniform rational B-splines surface neural networks
    Cheng, Ming-Yang
    Wu, Hung-Wen
    Su, Alvin Wen-Yu
    NEURAL PROCESSING LETTERS, 2008, 28 (01) : 1 - 15
  • [39] Delta-Sigma Modulator with Non-Uniform Quantization
    Hagiwara, Mao
    Kitayabu, Toru
    Ishikawa, Hiroyasu
    Shirai, Hiroshi
    2011 IEEE RADIO AND WIRELESS SYMPOSIUM (RWS), 2011, : 351 - 354
  • [40] On Non-Uniform Rational B-Splines Surface Neural Networks
    Ming-Yang Cheng
    Hung-Wen Wu
    Alvin Wen-Yu Su
    Neural Processing Letters, 2008, 28 : 1 - 15