Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks

被引：0

作者：

Geng, Xinkuang ^{[1
]}

Liu, Siting ^{[2
]}

Jiang, Jianfei ^{[1
]}

Jiang, Kai ^{[3
]}

Jiang, Honglan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Micronano Elect, Shanghai, Peoples R China

[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[3] Inspur Acad Sci & Technol, Jinan, Shandong, Peoples R China

来源：

2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.23919/DATE58400.2024.10546652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To reduce the demands for computation and memory of deep neural networks (DNNs), various quantization techniques have been extensively investigated. However, conventional methods cannot effectively capture the intrinsic data characteristics in DNNs, leading to a high accuracy degradation when employing low-bit-width quantization. In order to better align with the bell-shaped distribution, we propose an efficient non-uniform quantization scheme, denoted as compact powers-of-two (CPoT). Aiming to avoid the rigid resolution inherent in powers-of-two (PoT) without introducing new issues, we add a fractional part to its encoding, followed by a biasing operation to eliminate the unrepresentable region around 0. This approach effectively balances the grid resolution in both the vicinity of 0 and the edge region. To facilitate the hardware implementation, we optimize the dot product for CPoT based on the computational characteristics of the quantized DNNs, where the precomputable terms are extracted and incorporated into bias. Consequently, a multiply-accumulate (MAC) unit is designed for CPoT using shifters and look-up tables (LUTs). The experimental results show that, even with a certain level of approximation, our proposed CPoT outperforms state-of-the-art methods in data-free quantization (DFQ), a post-training quantization (PTQ) technique focusing on data privacy and computational efficiency. Furthermore, CPoT demonstrates superior efficiency in area and power compared to other methods in hardware implementation.

引用

页数：6

共 50 条

[31] Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization
Peric, Zoran
Aleksic, Danijela
Nikolic, Jelena
Tomic, Stefan
MATHEMATICS, 2022, 10 (19)
[32] Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers
Zhou, Xichuan
Duan, Yunmo
Ding, Rui
Wang, Qianchuan
Wang, Qi
Qin, Jian
Liu, Haijun
ELECTRONICS, 2023, 12 (24)
[33] On Practical Approach to Uniform Quantization of Non-redundant Neural Networks
Goncharenko, Alexander
Denisov, Andrey
Alyamkin, Sergey
Terentev, Evgeny
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II, 2019, 11728 : 349 - 360
[34] A Non-uniform Quantization Filter Based on Adaptive Quantization Interval in WSNs
Wen, Chenglin
Zhu, Chaoyang
Xu, Daxing
Quan, Lidi
COGNITIVE SYSTEMS AND SIGNAL PROCESSING, ICCSIP 2016, 2017, 710 : 595 - 605
[35] Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference
Pratap, Tej G. V. S. L.
Kumar, Raja
Pradeep, N. S.
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[36] A Non-Intrusive Load Monitoring Algorithm Based on Non-Uniform Sampling of Power Data and Deep Neural Networks
Fagiani, Marco
Bonfigli, Roberto
Principi, Emanuele
Squartini, Stefano
Mandolini, Luigi
ENERGIES, 2019, 12 (07)
[37] Efficient Simulation of Non-uniform Cellular Automata with a Convolutional Neural Network
Rollier, Michiel
Daly, Aisling J.
Bruno, Odemir M.
Baetens, Jan M.
CELLULAR AUTOMATA, ACRI 2024, 2024, 14978 : 121 - 131
[38] On non-uniform rational B-splines surface neural networks
Cheng, Ming-Yang
Wu, Hung-Wen
Su, Alvin Wen-Yu
NEURAL PROCESSING LETTERS, 2008, 28 (01) : 1 - 15
[39] Delta-Sigma Modulator with Non-Uniform Quantization
Hagiwara, Mao
Kitayabu, Toru
Ishikawa, Hiroyasu
Shirai, Hiroshi
2011 IEEE RADIO AND WIRELESS SYMPOSIUM (RWS), 2011, : 351 - 354
[40] On Non-Uniform Rational B-Splines Surface Neural Networks
Ming-Yang Cheng
Hung-Wen Wu
Alvin Wen-Yu Su
Neural Processing Letters, 2008, 28 : 1 - 15

← 1 2 3 4 5 →