Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers With Bridge Block Reconstruction for IoT Systems

被引:2
|
作者
Lee, Jemin [1 ]
Kwon, Yongin [1 ]
Park, Sihyeong [2 ]
Yu, Misun [1 ]
Park, Jeman [1 ]
Song, Hwanjun [3 ]
机构
[1] Elect & Telecommun Res Inst, Artificial Intelligence Comp Res Lab, Daejeon 34129, South Korea
[2] Korea Elect Technol Inst, SoC Platform Res Ctr, Seongnam 13509, South Korea
[3] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, Daejeon 34141, South Korea
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 22期
关键词
Transformers; Quantization (signal); Bridges; Computer architecture; Convolution; Computational modeling; Internet of Things; Model compression; posttraining quantization (PTQ); vision transformer (ViT);
D O I
10.1109/JIOT.2024.3403844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, posttraining quantization (PTQ) has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this article, we discover that applying existing PTQ methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: 1) highly dynamic ranges; 2) zero-point overflow; 3) diverse normalization; and 4) limited model parameters (<5M). To overcome these challenges, we propose a new PTQ method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, and EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with the existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT). We plan to release our code at https://gitlab.com/ones-ai/q-hyvit.
引用
收藏
页码:36384 / 36396
页数:13
相关论文
共 9 条
  • [1] AdaLog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
    Wu, Zhuguanyu
    Chen, Jiaxin
    Zhong, Hanwen
    Huang, Di
    Wang, Yunhong
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 411 - 427
  • [2] PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization
    Yuan, Zhihang
    Xue, Chenhao
    Chen, Yiqi
    Wu, Qiang
    Sun, Guangyu
    COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 191 - 207
  • [3] Hessian matrix-aware comprehensive post-training quantization for vision transformers
    Zhang, Weixing
    Tian, Zhuang
    Lin, Nan
    Yang, Cong
    Chen, Yongxia
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [4] Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
    Zhong, Yunshan
    Huang, You
    Hu, Jiawei
    Zhang, Yuxin
    Ji, Rongrong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2676 - 2692
  • [5] RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
    Li, Zhikai
    Xiao, Junrui
    Yang, Lianwei
    Gu, Qingyi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17181 - 17190
  • [6] NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
    Liu, Yijiang
    Yang, Huanrui
    Dong, Zhen
    Keutzer, Kurt
    Du, Li
    Zhang, Shanghang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20321 - 20330
  • [7] FGPTQ-ViT: Fine-Grained Post-training Quantization for Vision Transformers
    Liu, Caihua
    Shi, Hongyang
    He, Xinyu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 79 - 90
  • [8] ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers
    Jiang, Yanfeng
    Sun, Ning
    Xie, Xueshuo
    Yang, Fei
    Li, Tao
    NEURAL NETWORKS, 2025, 186
  • [9] AGQB-ViT: Adaptive granularity quantizer with bias for post-training quantization of Vision Transformers
    Huo, Ying
    Kang, Yongqiang
    Yang, Dawei
    Zhu, Jiahao
    NEUROCOMPUTING, 2025, 637