Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

被引:6
|
作者
Ma, Yuexiao [1 ]
Li, Huixia [2 ]
Zheng, Xiawu [3 ]
Xiao, Xuefeng [2 ]
Wang, Rui [2 ]
Wen, Shilei [2 ]
Pan, Xin [2 ]
Chao, Fei [1 ]
Ji, Rongrong [1 ,4 ]
机构
[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Sch Informat, Xiamen 361005, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
[4] Xiamen Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 x0.5.
引用
收藏
页码:7950 / 7959
页数:10
相关论文
共 50 条
  • [21] MetaAug: Meta-data Augmentation for Post-training Quantization
    Cuong Pham
    Anh Dung Hoang
    Nguyen, Cuong C.
    Trung Le
    Dinh Phung
    Carneiro, Gustavo
    Thanh-Toan Do
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 236 - 252
  • [22] PTMQ: Post-training Multi-Bit Quantization of Neural Networks
    Xu, Ke
    Li, Zhongcheng
    Wang, Shanshan
    Zhang, Xingyi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16193 - 16201
  • [23] Fine-grained Data Distribution Alignment for Post-Training Quantization
    Zhong, Yunshan
    Lin, Mingbao
    Chen, Mengzhao
    Li, Ke
    Shen, Yunhang
    Chao, Fei
    Wu, Yongjian
    Ji, Rongrong
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 70 - 86
  • [24] LKBQ: PUSHING THE LIMIT OF POST-TRAINING QUANTIZATION TO EXTREME 1 BIT
    Li, Tianxiang
    Chen, Bin
    Wang, Qian-Wei
    Huang, Yujun
    Xia, Shu-Tao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1775 - 1779
  • [25] PQ-SAM: Post-training Quantization for Segment Anything Model
    Liu, Xiaoyu
    Ding, Xin
    Yu, Lei
    Xi, Yuanyuan
    Li, Wei
    Tu, Zhijun
    Hu, Jie
    Chen, Hanting
    Yin, Baoqun
    Xiong, Zhiwei
    COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 420 - 437
  • [26] AdaLog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
    Wu, Zhuguanyu
    Chen, Jiaxin
    Zhong, Hanwen
    Huang, Di
    Wang, Yunhong
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 411 - 427
  • [27] PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization
    Yuan, Zhihang
    Xue, Chenhao
    Chen, Yiqi
    Wu, Qiang
    Sun, Guangyu
    COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 191 - 207
  • [28] Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks
    Latotzke, Cecilia
    Balim, Batuhan
    Gemmeke, Tobias
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1559 - 1566
  • [29] Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
    Frantar, Elias
    Singh, Sidak Pal
    Alistarh, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [30] Real Post-Training Quantization Framework for Resource-Optimized Multiplier in LLMs
    Seo, Minseok
    Jeong, Seongho
    Lee, Hyuk-Jae
    Nguyen, Xuan Truong
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 497 - 501