Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

被引:6
|
作者
Ma, Yuexiao [1 ]
Li, Huixia [2 ]
Zheng, Xiawu [3 ]
Xiao, Xuefeng [2 ]
Wang, Rui [2 ]
Wen, Shilei [2 ]
Pan, Xin [2 ]
Chao, Fei [1 ]
Ji, Rongrong [1 ,4 ]
机构
[1] Xiamen Univ, Minist Educ China, Key Lab Multimedia Trusted Percept & Efficient Co, Sch Informat, Xiamen 361005, Peoples R China
[2] ByteDance Inc, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
[4] Xiamen Univ, Shenzhen Res Inst, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
10.1109/CVPR52729.2023.00768
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2 x0.5.
引用
收藏
页码:7950 / 7959
页数:10
相关论文
共 50 条
  • [1] Loss aware post-training quantization
    Yury Nahshan
    Brian Chmiel
    Chaim Baskin
    Evgenii Zheltonozhskii
    Ron Banner
    Alex M. Bronstein
    Avi Mendelson
    Machine Learning, 2021, 110 : 3245 - 3262
  • [2] Post-Training Quantization for Vision Transformer
    Liu, Zhenhua
    Wang, Yunhe
    Han, Kai
    Zhang, Wei
    Ma, Siwei
    Gao, Wen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Post-training Quantization on Diffusion Models
    Shang, Yuzhang
    Yuan, Zhihang
    Xie, Bin
    Wu, Bingzhe
    Yan, Yan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1972 - 1981
  • [4] Attention Round for post-training quantization
    Diao, Huabin
    Li, Gongyan
    Xu, Shaoyun
    Kong, Chao
    Wang, Wei
    NEUROCOMPUTING, 2024, 565
  • [5] Loss aware post-training quantization
    Nahshan, Yury
    Chmiel, Brian
    Baskin, Chaim
    Zheltonozhskii, Evgenii
    Banner, Ron
    Bronstein, Alex M.
    Mendelson, Avi
    MACHINE LEARNING, 2021, 110 (11-12) : 3245 - 3262
  • [6] Post-Training Sparsity-Aware Quantization
    Shomron, Gil
    Gabbay, Freddy
    Kurzum, Samer
    Weiser, Uri
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Towards accurate post-training quantization for reparameterized models
    Zhang, Luoming
    He, Yefei
    Fei, Wen
    Lou, Zhenyu
    Wu, Weijia
    Ying, Yangwei
    Zhou, Hong
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [8] Towards Accurate Post-Training Quantization for Vision Transformer
    Ding, Yifu
    Qin, Haotong
    Yan, Qinghua
    Chai, Zhenhua
    Liu, Junjie
    Wei, Xiaolin
    Liu, Xianglong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5380 - 5388
  • [9] Improving the Post-Training Neural Network Quantization by Prepositive Feature Quantization
    Chu, Tianshu
    Yang, Zuopeng
    Huang, Xiaolin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 3056 - 3060
  • [10] Post-training Quantization of Deep Neural Network Weights
    Khayrov, E. M.
    Malsagov, M. Yu.
    Karandashev, I. M.
    ADVANCES IN NEURAL COMPUTATION, MACHINE LEARNING, AND COGNITIVE RESEARCH III, 2020, 856 : 230 - 238