Flexible Quantization for Efficient Convolutional Neural Networks

被引：3

作者：

Zacchigna, Federico Giordano ^{[1
]}

Lew, Sergio ^{[2
,3
]}

Lutenberg, Ariel ^{[1
,3
]}

机构：

[1] Univ Buenos Aires, Fac Ingn FIUBA, Lab Sistemas Embebidos LSE, C1063ACV, Buenos Aires, Argentina

[2] Univ Buenos Aires, Fac Ingn FIUBA, Inst Ingn Biomed IIBM, C1063ACV, Buenos Aires, Argentina

[3] Consejo Nacl Invest Cient & Tecn CONICET, C1425FQB, Buenos Aires, Argentina

来源：

ELECTRONICS | 2024年 / 13卷 / 10期

关键词：

CNN; quantization; uniform; non-uniform; mixed-precision; FPGA; ASIC; edge devices; embedded systems; CNN;

D O I：

10.3390/electronics13101923

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to similar to 1.58 bits, but with a loss in performance of only similar to 0.6%.

引用

页数：16

共 50 条

[21] An Efficient Accelerator for Sparse Convolutional Neural Networks
You, Weijie
Wu, Chang
2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
[22] Efficient Computation of Robustness of Convolutional Neural Networks
Arcaini, Paolo
Bombarda, Andrea
Bonfanti, Silvia
Gargantini, Angelo
THIRD IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST 2021), 2021, : 21 - 28
[23] Efficient Implementation of Convolutional Neural Networks on FPGA
Hadnagy, A.
Feher, B.
Kovacshazy, T.
2018 19TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE (ICCC), 2018, : 359 - 364
[24] Efficient Hardware Acceleration of Convolutional Neural Networks
Kala, S.
Jose, Babita R.
Mathew, Jimson
Nalesh, S.
32ND IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (IEEE SOCC 2019), 2019, : 191 - 192
[25] RNA: A Flexible and Efficient Accelerator Based on Dynamically Reconfigurable Computing for Multiple Convolutional Neural Networks
Yang, Chen
Hou, Jia
Wang, Yizhou
Zhang, Haibo
Wang, Xiaoli
Geng, Li
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (16)
[26] A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks
Yuan, Haiying
Zeng, Zhiyong
Cheng, Junpeng
Li, Minghao
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4370 - 4389
[27] A Flexible Sparsity-Aware Accelerator with High Sensitivity and Efficient Operation for Convolutional Neural Networks
Haiying Yuan
Zhiyong Zeng
Junpeng Cheng
Minghao Li
Circuits, Systems, and Signal Processing, 2022, 41 : 4370 - 4389
[28] General Bitwidth Assignment for Efficient Deep Convolutional Neural Network Quantization
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5253 - 5267
[29] Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks
Bao, Han
Qin, Yifan
Chen, Jia
Yang, Ling
Li, Jiancong
Zhou, Houji
Li, Yi
Miao, Xiangshui
FRONTIERS IN ELECTRONICS, 2022, 3
[30] Quantization of constrained processor data paths applied to Convolutional Neural Networks
de Bruin, Barry
Zivkovic, Zoran
Corporaal, Henk
2018 21ST EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2018), 2018, : 357 - 364

← 1 2 3 4 5 →