Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks

被引：0

作者：

Geng, Xinkuang ^{[1
]}

Liu, Siting ^{[2
]}

Jiang, Jianfei ^{[1
]}

Jiang, Kai ^{[3
]}

Jiang, Honglan ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Micronano Elect, Shanghai, Peoples R China

[2] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[3] Inspur Acad Sci & Technol, Jinan, Shandong, Peoples R China

来源：

2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.23919/DATE58400.2024.10546652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

To reduce the demands for computation and memory of deep neural networks (DNNs), various quantization techniques have been extensively investigated. However, conventional methods cannot effectively capture the intrinsic data characteristics in DNNs, leading to a high accuracy degradation when employing low-bit-width quantization. In order to better align with the bell-shaped distribution, we propose an efficient non-uniform quantization scheme, denoted as compact powers-of-two (CPoT). Aiming to avoid the rigid resolution inherent in powers-of-two (PoT) without introducing new issues, we add a fractional part to its encoding, followed by a biasing operation to eliminate the unrepresentable region around 0. This approach effectively balances the grid resolution in both the vicinity of 0 and the edge region. To facilitate the hardware implementation, we optimize the dot product for CPoT based on the computational characteristics of the quantized DNNs, where the precomputable terms are extracted and incorporated into bias. Consequently, a multiply-accumulate (MAC) unit is designed for CPoT using shifters and look-up tables (LUTs). The experimental results show that, even with a certain level of approximation, our proposed CPoT outperforms state-of-the-art methods in data-free quantization (DFQ), a post-training quantization (PTQ) technique focusing on data privacy and computational efficiency. Furthermore, CPoT demonstrates superior efficiency in area and power compared to other methods in hardware implementation.

引用

页数：6

共 50 条

[41] A non-uniform quantization scheme for visualization of CT images
Mehmood, Anam
Khan, Ishtiaq Rasool
Dawood, Hassan
Dawood, Hussain
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (04) : 4311 - 4326
[42] A Trajectory Compression Algorithm Based on Non-uniform Quantization
Lv, Chengjiao
Chen, Feng
Xu, Yongzhi
Song, Junping
Lv, Pin
2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 2469 - 2474
[43] Extracting More Quantum Randomness With Non-Uniform Quantization
Ji, Bai-Xiang
Li, Jian
Wang, Qin
IEEE PHOTONICS JOURNAL, 2022, 14 (04):
[44] Efficient Broadcasting in Known Geometric Radio Networks with Non-uniform Ranges
Gasieniec, Leszek
Kowalski, Dariusz R.
Lingas, Andrzej
Wahlen, Martin
DISTRIBUTED COMPUTING, PROCEEDINGS, 2008, 5218 : 274 - +
[45] Compact stars with non-uniform relativistic polytrope
Nouh, Mohamed I.
Foda, Mona M.
Aboueisha, Mohamed S.
SCIENTIFIC REPORTS, 2024, 14 (01):
[46] Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition
Aldonso Becerra
J. Ismael de la Rosa
Efrén González
A. David Pedroza
N. Iracemi Escalante
Multimedia Tools and Applications, 2018, 77 : 27231 - 27267
[47] Non-uniform Label Smoothing for Diabetic Retinopathy Grading from Retinal Fundus Images with Deep Neural Networks
Galdran, Adrian
Chelbi, Jihed
Kobi, Riadh
Dolz, Jose
Lombaert, Herve
ben Ayed, Ismail
Chakor, Hadi
TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2020, 9 (02): : 1 - 8
[48] Speech Recognition using Deep Neural Networks Trained with Non-uniform Frame-Level Cost Functions
Becerra, Aldonso
Ismael de la Rosa, J.
Gonzalez, Efren
David Pedroza, A.
Manuel Martinez, J.
Iracemi Escalante, N.
2017 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2017,
[49] Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition
Becerra, Aldonso
Ismael de la Rosa, J.
Gonzalez, Efren
David Pedroza, A.
Iracemi Escalante, N.
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 27231 - 27267
[50] UTransNet: An efficient hybrid architecture of convolutional neural networks and transformer for the approximation of non-uniform steady laminar flow
Wang, Weiqing
Yin, Tianle
Pang, Jing
PHYSICS OF FLUIDS, 2025, 37 (03)

← 1 2 3 4 5 →