SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

被引:68
|
作者
Faraone, Julian [1 ]
Fraser, Nicholas [2 ]
Blott, Michaela [2 ]
Leong, Philip H. W. [1 ]
机构
[1] Univ Sydney, Sydney, NSW, Australia
[2] Xilinx Res Labs, Dublin, Ireland
关键词
D O I
10.1109/CVPR.2018.00452
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization method to reduce this loss by learning a symmetric code book for particular weight subgroups. These subgroups are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision representations is preserved. Empirically, we show that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations. We also demonstrate that this representation imposes minimal or no hardware implications to more coarse-grained approaches. Source code is available at http s ://www.github.com/julianfaraone/SYQ.
引用
收藏
页码:4300 / 4309
页数:10
相关论文
共 50 条
  • [41] Vector Quantization of Deep Convolutional Neural Networks With Learned Codebook
    Yang, Siyuan
    Mao, Yongyi
    2022 17TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT), 2022, : 39 - 44
  • [42] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
    Gong, Cheng
    Lu, Ye
    Xie, Kunpeng
    Jin, Zongming
    Li, Tao
    Wang, Yanzhi
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
  • [43] Dataflow-based Joint Quantization for Deep Neural Networks
    Geng, Xue
    Fu, Jie
    Zhao, Bin
    Lin, Jie
    Aly, Mohamed M. Sabry
    Pal, Christopher
    Chandrasekhar, Vijay
    2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 574 - 574
  • [44] Latent Weight Quantization for Integerized Training of Deep Neural Networks
    Fei, Wen
    Dai, Wenrui
    Zhang, Liang
    Zhang, Luoming
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2816 - 2832
  • [45] Online Deep Learning: Learning Deep Neural Networks on the Fly
    Sahoo, Doyen
    Pham, Quang
    Lu, Jing
    Hoi, Steven C. H.
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2660 - 2666
  • [46] Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
    Hoefler, Torsten
    Alistarh, Dan
    Ben-Nun, Tal
    Dryden, Nikoli
    Peste, Alexandra
    Journal of Machine Learning Research, 2021, 22
  • [47] Efficient and Low Overhead Memristive Activation Circuit for Deep Learning Neural Networks
    Bala, Anu
    Yang, Xiaohan
    Adeyemo, Adedotun
    Jabir, Abusaleh
    JOURNAL OF LOW POWER ELECTRONICS, 2019, 15 (02) : 214 - 223
  • [48] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
    Hoefler, Torsten
    Alistarh, Dan
    Ben-Nun, Tal
    Dryden, Nikoli
    Peste, Alexandra
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 23
  • [49] Efficient Integer Vector Homomorphic Encryption Using Deep Learning for Neural Networks
    Xie, Tianying
    Li, Yantao
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 83 - 95
  • [50] Gating Mechanism in Deep Neural Networks for Resource-Efficient Continual Learning
    Jin, Hyundong
    Yun, Kimin
    Kim, Eunwoo
    IEEE ACCESS, 2022, 10 : 18776 - 18786