Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

被引：6

作者：

Ma, Yuexiao ^{[1
]}

Li, Huixia ^{[2
]}

Zheng, Xiawu ^{[3
]}

Xiao, Xuefeng ^{[2
]}

Wang, Rui ^{[2
]}

Wen, Shilei ^{[2
]}

Pan, Xin ^{[2
]}

Chao, Fei ^{[1
]}

Ji, Rongrong ^{[1
,4
]}

机构：

[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Sch Informat, Xiamen 361005, Peoples R China

[2] ByteDance Inc, Beijing, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

[4] Xiamen Univ, Shenzhen Res Inst, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00768

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 x0.5.

引用

页码：7950 / 7959

页数：10

共 50 条

[21] MetaAug: Meta-data Augmentation for Post-training Quantization
Cuong Pham
Anh Dung Hoang
Nguyen, Cuong C.
Trung Le
Dinh Phung
Carneiro, Gustavo
Thanh-Toan Do
COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 236 - 252
[22] PTMQ: Post-training Multi-Bit Quantization of Neural Networks
Xu, Ke
Li, Zhongcheng
Wang, Shanshan
Zhang, Xingyi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16193 - 16201
[23] Fine-grained Data Distribution Alignment for Post-Training Quantization
Zhong, Yunshan
Lin, Mingbao
Chen, Mengzhao
Li, Ke
Shen, Yunhang
Chao, Fei
Wu, Yongjian
Ji, Rongrong
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 70 - 86
[24] LKBQ: PUSHING THE LIMIT OF POST-TRAINING QUANTIZATION TO EXTREME 1 BIT
Li, Tianxiang
Chen, Bin
Wang, Qian-Wei
Huang, Yujun
Xia, Shu-Tao
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1775 - 1779
[25] PQ-SAM: Post-training Quantization for Segment Anything Model
Liu, Xiaoyu
Ding, Xin
Yu, Lei
Xi, Yuanyuan
Li, Wei
Tu, Zhijun
Hu, Jie
Chen, Hanting
Yin, Baoqun
Xiong, Zhiwei
COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 420 - 437
[26] AdaLog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Wu, Zhuguanyu
Chen, Jiaxin
Zhong, Hanwen
Huang, Di
Wang, Yunhong
COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 411 - 427
[27] PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization
Yuan, Zhihang
Xue, Chenhao
Chen, Yiqi
Wu, Qiang
Sun, Guangyu
COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 191 - 207
[28] Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks
Latotzke, Cecilia
Balim, Batuhan
Gemmeke, Tobias
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1559 - 1566
[29] Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Frantar, Elias
Singh, Sidak Pal
Alistarh, Dan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[30] Real Post-Training Quantization Framework for Resource-Optimized Multiplier in LLMs
Seo, Minseok
Jeong, Seongho
Lee, Hyuk-Jae
Nguyen, Xuan Truong
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 497 - 501

← 1 2 3 4 5 →