Flexible Quantization for Efficient Convolutional Neural Networks

被引：3

作者：

Zacchigna, Federico Giordano ^{[1
]}

Lew, Sergio ^{[2
,3
]}

Lutenberg, Ariel ^{[1
,3
]}

机构：

[1] Univ Buenos Aires, Fac Ingn FIUBA, Lab Sistemas Embebidos LSE, C1063ACV, Buenos Aires, Argentina

[2] Univ Buenos Aires, Fac Ingn FIUBA, Inst Ingn Biomed IIBM, C1063ACV, Buenos Aires, Argentina

[3] Consejo Nacl Invest Cient & Tecn CONICET, C1425FQB, Buenos Aires, Argentina

来源：

ELECTRONICS | 2024年 / 13卷 / 10期

关键词：

CNN; quantization; uniform; non-uniform; mixed-precision; FPGA; ASIC; edge devices; embedded systems; CNN;

D O I：

10.3390/electronics13101923

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to similar to 1.58 bits, but with a loss in performance of only similar to 0.6%.

引用

页数：16

共 50 条

[1] Space Efficient Quantization for Deep Convolutional Neural Networks
Zhao, Dong-Di
Li, Fan
Sharif, Kashif
Xia, Guang-Min
Wang, Yu
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (02) : 305 - 317
[2] Space Efficient Quantization for Deep Convolutional Neural Networks
Dong-Di Zhao
Fan Li
Kashif Sharif
Guang-Min Xia
Yu Wang
Journal of Computer Science and Technology, 2019, 34 : 305 - 317
[3] Hybrid Approach for Efficient Quantization of Weights in Convolutional Neural Networks
Seo, Sanghyun
Kim, Juntae
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 638 - 641
[4] Quantization in Graph Convolutional Neural Networks
Ben Saad, Leila
Beferull-Lozano, Baltasar
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1855 - 1859
[5] An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks
Xie, Xiaoru
Lin, Jun
Wang, Zhongfeng
Wei, Jinghe
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (07) : 2936 - 2949
[6] HadaNets: Flexible Quantization Strategies for Neural Networks
Akhauri, Yash
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 526 - 534
[7] GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks
Li, Jiajun
Louri, Ahmed
Karanth, Avinash
Bunescu, Razvan
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 775 - 788
[8] Mixed-Clipping Quantization for Convolutional Neural Networks
Huang Z.
Du H.
Chang L.
Chang, Libo (changlibo@xupt.edu.cn), 1600, Institute of Computing Technology (33): : 553 - 559
[9] An efficient segmented quantization for graph neural networks
Yue Dai
Xulong Tang
Youtao Zhang
CCF Transactions on High Performance Computing, 2022, 4 : 461 - 473
[10] An efficient segmented quantization for graph neural networks
Dai, Yue
Tang, Xulong
Zhang, Youtao
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (04) : 461 - 473

← 1 2 3 4 5 →