Attention Round for post-training quantization

被引:7
|
作者
Diao, Huabin [1 ]
Li, Gongyan [2 ]
Xu, Shaoyun [2 ]
Kong, Chao [1 ]
Wang, Wei [1 ]
机构
[1] Anhui Polytech Univ, Beijing Middle Rd, Wuhu 241000, Anhui, Peoples R China
[2] Chinese Acad Sci, Inst Microelect, 3 Beituocheng West Rd, Beijing 100029, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Convolutional neural networks; Post-training quantization; Attention Round; Mixed precision;
D O I
10.1016/j.neucom.2023.127012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quantization methods for convolutional neural network models can be broadly categorized into post-training quantization (PTQ) and quantization aware training (QAT). While PTQ offers the advantage of requiring only a small portion of the data for quantization, the resulting quantized model may not be as effective as QAT. To address this limitation, this paper proposes a novel quantization function named Attention Round. Unlike traditional quantization function that map 32 bit floating-point value w to nearby quantization levels, Attention Round allows w to be mapped to all possible quantization levels in the entire quantization space, expanding the quantization optimization space. The possibilities of mapping w to different quantization levels are inversely correlated with the distance between w and the quantization levels, regulated by a Gaussian decay function. Furthermore, to tackle the challenge of mixed precision quantization, this paper introduces a lossy coding length measure to assign quantization precision to different layers of the model, eliminating the need for solving a combinatorial optimization problem. Experimental evaluations on various models demonstrate the effectiveness of the proposed method. Notably, for ResNet18 and MobileNetV2, the PTQ approach achieves comparable quantization performance to QAT while utilizing only 1024 training data and 10 min for the quantization process.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] PTMQ: Post-training Multi-Bit Quantization of Neural Networks
    Xu, Ke
    Li, Zhongcheng
    Wang, Shanshan
    Zhang, Xingyi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16193 - 16201
  • [22] Fine-grained Data Distribution Alignment for Post-Training Quantization
    Zhong, Yunshan
    Lin, Mingbao
    Chen, Mengzhao
    Li, Ke
    Shen, Yunhang
    Chao, Fei
    Wu, Yongjian
    Ji, Rongrong
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 70 - 86
  • [23] LKBQ: PUSHING THE LIMIT OF POST-TRAINING QUANTIZATION TO EXTREME 1 BIT
    Li, Tianxiang
    Chen, Bin
    Wang, Qian-Wei
    Huang, Yujun
    Xia, Shu-Tao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1775 - 1779
  • [24] Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
    Ma, Yuexiao
    Li, Huixia
    Zheng, Xiawu
    Xiao, Xuefeng
    Wang, Rui
    Wen, Shilei
    Pan, Xin
    Chao, Fei
    Ji, Rongrong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7950 - 7959
  • [25] PQ-SAM: Post-training Quantization for Segment Anything Model
    Liu, Xiaoyu
    Ding, Xin
    Yu, Lei
    Xi, Yuanyuan
    Li, Wei
    Tu, Zhijun
    Hu, Jie
    Chen, Hanting
    Yin, Baoqun
    Xiong, Zhiwei
    COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 420 - 437
  • [26] AdaLog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
    Wu, Zhuguanyu
    Chen, Jiaxin
    Zhong, Hanwen
    Huang, Di
    Wang, Yunhong
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 411 - 427
  • [27] PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization
    Yuan, Zhihang
    Xue, Chenhao
    Chen, Yiqi
    Wu, Qiang
    Sun, Guangyu
    COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 191 - 207
  • [28] Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks
    Latotzke, Cecilia
    Balim, Batuhan
    Gemmeke, Tobias
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1559 - 1566
  • [29] Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
    Frantar, Elias
    Singh, Sidak Pal
    Alistarh, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [30] Real Post-Training Quantization Framework for Resource-Optimized Multiplier in LLMs
    Seo, Minseok
    Jeong, Seongho
    Lee, Hyuk-Jae
    Nguyen, Xuan Truong
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 497 - 501