Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

被引:6
|
作者
Ma, Yuexiao [1 ]
Li, Huixia [2 ]
Zheng, Xiawu [3 ]
Xiao, Xuefeng [2 ]
Wang, Rui [2 ]
Wen, Shilei [2 ]
Pan, Xin [2 ]
Chao, Fei [1 ]
Ji, Rongrong [1 ,4 ]
机构
[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Sch Informat, Xiamen 361005, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
[4] Xiamen Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 x0.5.
引用
收藏
页码:7950 / 7959
页数:10
相关论文
共 50 条
  • [31] Enhancing Learning Transfer Through Post-Training Activities
    Scott, W. S.
    McGraw, L.
    Sauer, T.
    Belton, H.
    Bittner, S.
    Lynn, D.
    Sharrer, J.
    Speicher, T.
    TRANSFUSION, 2011, 51 : 270A - 271A
  • [32] Linear Domain-aware Log-scale Post-training Quantization
    Kim, Sungrae
    Kim, Hyun
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA), 2021,
  • [33] Hybrid Post-Training Quantization for Super-Resolution Neural Network Compression
    Xu, Naijie
    Chen, Xiaohui
    Cao, Youlong
    Zhang, Wenyi
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 379 - 383
  • [34] Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization
    Peric, Zoran
    Aleksic, Danijela
    Nikolic, Jelena
    Tomic, Stefan
    MATHEMATICS, 2022, 10 (19)
  • [35] Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision
    Liu, Xinghcao
    Ye, Mao
    Zhou, Dengyong
    Liu, Qiang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8697 - 8705
  • [36] Selective Focus: Investigating Semantics Sensitivity in Post-training Quantization for Lane Detection
    Fan, Yunqian
    Wei, Xiuying
    Gong, Ruihao
    Ma, Yuqing
    Zhang, Xiangguo
    Zhang, Qi
    Liu, Xianglong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 11936 - 11943
  • [37] Optimization-Based Post-Training Quantization With Bit-Split and Stitching
    Wang, Peisong
    Chen, Weihan
    He, Xiangyu
    Chen, Qiang
    Liu, Qingshan
    Cheng, Jian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2119 - 2135
  • [38] ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
    Yao, Zhewei
    Aminabadi, Reza Yazdani
    Zhang, Minjia
    Wu, Xiaoxia
    Li, Conglong
    He, Yuxiong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric
    Liu, Jiawei
    Niu, Lin
    Yuan, Zhihang
    Yang, Dawei
    Wang, Xinggang
    Liu, Wenyu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24427 - 24437
  • [40] Z-FOLD: A Frustratingly Easy Post-Training Quantization Scheme for LLMs
    Jeon, Yongkweon
    Lee, Chungman
    Park, Kyungphil
    Kim, Ho-young
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14446 - 14461