Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

被引：6

作者：

Ma, Yuexiao ^{[1
]}

Li, Huixia ^{[2
]}

Zheng, Xiawu ^{[3
]}

Xiao, Xuefeng ^{[2
]}

Wang, Rui ^{[2
]}

Wen, Shilei ^{[2
]}

Pan, Xin ^{[2
]}

Chao, Fei ^{[1
]}

Ji, Rongrong ^{[1
,4
]}

机构：

[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Sch Informat, Xiamen 361005, Peoples R China

[2] ByteDance Inc, Beijing, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

[4] Xiamen Univ, Shenzhen Res Inst, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00768

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 x0.5.

引用

页码：7950 / 7959

页数：10

共 50 条

[1] Loss aware post-training quantization
Yury Nahshan
Brian Chmiel
Chaim Baskin
Evgenii Zheltonozhskii
Ron Banner
Alex M. Bronstein
Avi Mendelson
Machine Learning, 2021, 110 : 3245 - 3262
[2] Post-Training Quantization for Vision Transformer
Liu, Zhenhua
Wang, Yunhe
Han, Kai
Zhang, Wei
Ma, Siwei
Gao, Wen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Post-training Quantization on Diffusion Models
Shang, Yuzhang
Yuan, Zhihang
Xie, Bin
Wu, Bingzhe
Yan, Yan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1972 - 1981
[4] Attention Round for post-training quantization
Diao, Huabin
Li, Gongyan
Xu, Shaoyun
Kong, Chao
Wang, Wei
NEUROCOMPUTING, 2024, 565
[5] Loss aware post-training quantization
Nahshan, Yury
Chmiel, Brian
Baskin, Chaim
Zheltonozhskii, Evgenii
Banner, Ron
Bronstein, Alex M.
Mendelson, Avi
MACHINE LEARNING, 2021, 110 (11-12) : 3245 - 3262
[6] Post-Training Sparsity-Aware Quantization
Shomron, Gil
Gabbay, Freddy
Kurzum, Samer
Weiser, Uri
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Towards accurate post-training quantization for reparameterized models
Zhang, Luoming
He, Yefei
Fei, Wen
Lou, Zhenyu
Wu, Weijia
Ying, Yangwei
Zhou, Hong
APPLIED INTELLIGENCE, 2025, 55 (07)
[8] Towards Accurate Post-Training Quantization for Vision Transformer
Ding, Yifu
Qin, Haotong
Yan, Qinghua
Chai, Zhenhua
Liu, Junjie
Wei, Xiaolin
Liu, Xianglong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5380 - 5388
[9] Improving the Post-Training Neural Network Quantization by Prepositive Feature Quantization
Chu, Tianshu
Yang, Zuopeng
Huang, Xiaolin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 3056 - 3060
[10] Post-training Quantization of Deep Neural Network Weights
Khayrov, E. M.
Malsagov, M. Yu.
Karandashev, I. M.
ADVANCES IN NEURAL COMPUTATION, MACHINE LEARNING, AND COGNITIVE RESEARCH III, 2020, 856 : 230 - 238

← 1 2 3 4 5 →