SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

被引：68

作者：

Faraone, Julian ^{[1
]}

Fraser, Nicholas ^{[2
]}

Blott, Michaela ^{[2
]}

Leong, Philip H. W. ^{[1
]}

机构：

[1] Univ Sydney, Sydney, NSW, Australia

[2] Xilinx Res Labs, Dublin, Ireland

来源：

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年

关键词：

D O I：

10.1109/CVPR.2018.00452

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization method to reduce this loss by learning a symmetric code book for particular weight subgroups. These subgroups are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision representations is preserved. Empirically, we show that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations. We also demonstrate that this representation imposes minimal or no hardware implications to more coarse-grained approaches. Source code is available at http s ://www.github.com/julianfaraone/SYQ.

引用

页码：4300 / 4309

页数：10

共 50 条

[41] Vector Quantization of Deep Convolutional Neural Networks With Learned Codebook
Yang, Siyuan
Mao, Yongyi
2022 17TH CANADIAN WORKSHOP ON INFORMATION THEORY (CWIT), 2022, : 39 - 44
[42] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
Gong, Cheng
Lu, Ye
Xie, Kunpeng
Jin, Zongming
Li, Tao
Wang, Yanzhi
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
[43] Dataflow-based Joint Quantization for Deep Neural Networks
Geng, Xue
Fu, Jie
Zhao, Bin
Lin, Jie
Aly, Mohamed M. Sabry
Pal, Christopher
Chandrasekhar, Vijay
2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 574 - 574
[44] Latent Weight Quantization for Integerized Training of Deep Neural Networks
Fei, Wen
Dai, Wenrui
Zhang, Liang
Zhang, Luoming
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2816 - 2832
[45] Online Deep Learning: Learning Deep Neural Networks on the Fly
Sahoo, Doyen
Pham, Quang
Lu, Jing
Hoi, Steven C. H.
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2660 - 2666
[46] Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
Hoefler, Torsten
Alistarh, Dan
Ben-Nun, Tal
Dryden, Nikoli
Peste, Alexandra
Journal of Machine Learning Research, 2021, 22
[47] Efficient and Low Overhead Memristive Activation Circuit for Deep Learning Neural Networks
Bala, Anu
Yang, Xiaohan
Adeyemo, Adedotun
Jabir, Abusaleh
JOURNAL OF LOW POWER ELECTRONICS, 2019, 15 (02) : 214 - 223
[48] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Hoefler, Torsten
Alistarh, Dan
Ben-Nun, Tal
Dryden, Nikoli
Peste, Alexandra
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 23
[49] Efficient Integer Vector Homomorphic Encryption Using Deep Learning for Neural Networks
Xie, Tianying
Li, Yantao
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 83 - 95
[50] Gating Mechanism in Deep Neural Networks for Resource-Efficient Continual Learning
Jin, Hyundong
Yun, Kimin
Kim, Eunwoo
IEEE ACCESS, 2022, 10 : 18776 - 18786

← 1 2 3 4 5 →