Overcoming Oscillations in Quantization-Aware Training

被引：0

作者：

Nagel, Markus ^{[1
]}

Fournarakis, Marios ^{[1
]}

Bondarenko, Yelysei ^{[1
]}

Blankevoort, Tijmen ^{[1
]}

机构：

[1] Qualcomm AI Res, San Diego, CA 92121 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training (QAT) are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. These effects are particularly pronounced in low-bit (<= 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets. In our analysis we investigate several previously proposed QAT algorithms and show that most of these are unable to overcome oscillations. Finally, we propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing. We demonstrate that our algorithms achieve state-of-the-art accuracy for low-bit (3 & 4 bits) weight and activation quantization of efficient architectures, such as MobileNetV2, MobileNetV3, and EfficentNet-lite on ImageNet. Our source code is available at https://github.com/qualcomm-ai- research/oscillations-qat.

引用

页数：13

共 50 条

[41] Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference
Hawks, Benjamin
Duarte, Javier
Fraser, Nicholas J.
Pappalardo, Alessandro
Nhan Tran
Umuroglu, Yaman
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[42] QuantBayes: Weight Optimization for Memristive Neural Networks via Quantization-Aware Bayesian Inference
Zhou, Yue
Hu, Xiaofang
Wang, Lidan
Zhou, Guangdong
Duan, Shukai
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (12) : 4851 - 4861
[43] QPA: A Quantization-Aware Piecewise Polynomial Approximation Methodology for Hardware-Efficient Implementations
Geng, Haoran
Chen, Xiaoliang
Zhao, Ning
Du, Yuan
Du, Li
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (07) : 931 - 944
[44] Quantization-Aware Binaural MWF Based Noise Reduction Incorporating External Wireless Devices
Zhang, Jie
Li, Changheng
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3118 - 3131
[45] Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI
Pistellato, Mara
Bergamasco, Filippo
Bigaglia, Gianluca
Gasparetto, Andrea
Albarelli, Andrea
Boschetti, Marco
Passerone, Roberto
[J]. SENSORS, 2023, 23 (10)
[46] Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
Lu, Yao
Rodriguez, Hiram Rayo Torres
Vogel, Sebastian
van de Waterlaat, Nick
Jancura, Pavol
[J]. PROCEEDINGS 2023 IEEE/ACM INTERNATIONAL WORKSHOP ON COMPILERS, DEPLOYMENT, AND TOOLING FOR EDGE AI, CODAI 2023, 2023, : 1 - 5
[47] Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Li, Zhengang
Lu, Alec
Xie, Yanyue
Kong, Zhenglun
Sun, Mengshu
Tang, Hao
Xue, Zhong Jia
Dong, Peiyan
Ding, Caiwen
Wang, Yanzhi
Lin, Xue
Fang, Zhenman
[J]. PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 324 - 337
[48] A reconfigurable multi-precision quantization-aware nonlinear activation function hardware module for DNNs
Hong, Qi
Liu, Zhiming
Long, Qiang
Tong, Hao
Zhang, Tianxu
Zhu, Xiaowen
Zhao, Yunong
Ru, Hua
Zha, Yuxing
Zhou, Ziyuan
Wu, Jiashun
Tan, Hongtao
Hong, Weiqiang
Xu, Yaohua
Guo, Xiaohui
[J]. MICROELECTRONICS JOURNAL, 2024, 151
[49] One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training
Ma, Lianbo
Zhou, Yuee
Ma, Jianlun
Yu, Guo
Li, Qing
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14246 - 14254
[50] Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks
Liu, Fuqiang
Liu, Chenchen
[J]. 2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,

← 1 2 3 4 5 →