Flexible Quantization for Efficient Convolutional Neural Networks

被引:1
|
作者
Zacchigna, Federico Giordano [1 ]
Lew, Sergio [2 ,3 ]
Lutenberg, Ariel [1 ,3 ]
机构
[1] Univ Buenos Aires, Fac Ingn FIUBA, Lab Sistemas Embebidos LSE, C1063ACV, Buenos Aires, Argentina
[2] Univ Buenos Aires, Fac Ingn FIUBA, Inst Ingn Biomed IIBM, C1063ACV, Buenos Aires, Argentina
[3] Consejo Nacl Invest Cient & Tecn CONICET, C1425FQB, Buenos Aires, Argentina
关键词
CNN; quantization; uniform; non-uniform; mixed-precision; FPGA; ASIC; edge devices; embedded systems; CNN;
D O I
10.3390/electronics13101923
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work focuses on the efficient quantization of convolutional neural networks (CNNs). Specifically, we introduce a method called non-uniform uniform quantization (NUUQ), a novel quantization methodology that combines the benefits of non-uniform quantization, such as high compression levels, with the advantages of uniform quantization, which enables an efficient implementation in fixed-point hardware. NUUQ is based on decoupling the quantization levels from the number of bits. This decoupling allows for a trade-off between the spatial and temporal complexity of the implementation, which can be leveraged to further reduce the spatial complexity of the CNN, without a significant performance loss. Additionally, we explore different quantization configurations and address typical use cases. The NUUQ algorithm demonstrates the capability to achieve compression levels equivalent to 2 bits without an accuracy loss and even levels equivalent to similar to 1.58 bits, but with a loss in performance of only similar to 0.6%.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Space Efficient Quantization for Deep Convolutional Neural Networks
    Zhao, Dong-Di
    Li, Fan
    Sharif, Kashif
    Xia, Guang-Min
    Wang, Yu
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (02) : 305 - 317
  • [2] Space Efficient Quantization for Deep Convolutional Neural Networks
    Dong-Di Zhao
    Fan Li
    Kashif Sharif
    Guang-Min Xia
    Yu Wang
    [J]. Journal of Computer Science and Technology, 2019, 34 : 305 - 317
  • [3] Hybrid Approach for Efficient Quantization of Weights in Convolutional Neural Networks
    Seo, Sanghyun
    Kim, Juntae
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 638 - 641
  • [4] Quantization in Graph Convolutional Neural Networks
    Ben Saad, Leila
    Beferull-Lozano, Baltasar
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1855 - 1859
  • [5] An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks
    Xie, Xiaoru
    Lin, Jun
    Wang, Zhongfeng
    Wei, Jinghe
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (07) : 2936 - 2949
  • [6] HadaNets: Flexible Quantization Strategies for Neural Networks
    Akhauri, Yash
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 526 - 534
  • [7] GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks
    Li, Jiajun
    Louri, Ahmed
    Karanth, Avinash
    Bunescu, Razvan
    [J]. 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 775 - 788
  • [8] Mixed-Clipping Quantization for Convolutional Neural Networks
    卷积神经网络混合截断量化
    [J]. Chang, Libo (changlibo@xupt.edu.cn), 1600, Institute of Computing Technology (33): : 553 - 559
  • [9] An efficient segmented quantization for graph neural networks
    Yue Dai
    Xulong Tang
    Youtao Zhang
    [J]. CCF Transactions on High Performance Computing, 2022, 4 : 461 - 473
  • [10] An efficient segmented quantization for graph neural networks
    Dai, Yue
    Tang, Xulong
    Zhang, Youtao
    [J]. CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2022, 4 (04) : 461 - 473