Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers With Bridge Block Reconstruction for IoT Systems

被引：2

作者：

Lee, Jemin ^{[1
]}

Kwon, Yongin ^{[1
]}

Park, Sihyeong ^{[2
]}

Yu, Misun ^{[1
]}

Park, Jeman ^{[1
]}

Song, Hwanjun ^{[3
]}

机构：

[1] Elect & Telecommun Res Inst, Artificial Intelligence Comp Res Lab, Daejeon 34129, South Korea

[2] Korea Elect Technol Inst, SoC Platform Res Ctr, Seongnam 13509, South Korea

[3] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, Daejeon 34141, South Korea

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 22期

关键词：

Transformers; Quantization (signal); Bridges; Computer architecture; Convolution; Computational modeling; Internet of Things; Model compression; posttraining quantization (PTQ); vision transformer (ViT);

D O I：

10.1109/JIOT.2024.3403844

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, posttraining quantization (PTQ) has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this article, we discover that applying existing PTQ methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: 1) highly dynamic ranges; 2) zero-point overflow; 3) diverse normalization; and 4) limited model parameters (<5M). To overcome these challenges, we propose a new PTQ method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, and EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with the existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT). We plan to release our code at https://gitlab.com/ones-ai/q-hyvit.

引用

页码：36384 / 36396

页数：13

共 9 条

[1] AdaLog: Post-training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Wu, Zhuguanyu
Chen, Jiaxin
Zhong, Hanwen
Huang, Di
Wang, Yunhong
COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 411 - 427
[2] PTQ4ViT: Post-training Quantization for Vision Transformers with Twin Uniform Quantization
Yuan, Zhihang
Xue, Chenhao
Chen, Yiqi
Wu, Qiang
Sun, Guangyu
COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 191 - 207
[3] Hessian matrix-aware comprehensive post-training quantization for vision transformers
Zhang, Weixing
Tian, Zhuang
Lin, Nan
Yang, Cong
Chen, Yongxia
JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
[4] Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
Zhong, Yunshan
Huang, You
Hu, Jiawei
Zhang, Yuxin
Ji, Rongrong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2676 - 2692
[5] RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
Li, Zhikai
Xiao, Junrui
Yang, Lianwei
Gu, Qingyi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17181 - 17190
[6] NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Liu, Yijiang
Yang, Huanrui
Dong, Zhen
Keutzer, Kurt
Du, Li
Zhang, Shanghang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20321 - 20330
[7] FGPTQ-ViT: Fine-Grained Post-training Quantization for Vision Transformers
Liu, Caihua
Shi, Hongyang
He, Xinyu
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 79 - 90
[8] ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers
Jiang, Yanfeng
Sun, Ning
Xie, Xueshuo
Yang, Fei
Li, Tao
NEURAL NETWORKS, 2025, 186
[9] AGQB-ViT: Adaptive granularity quantizer with bias for post-training quantization of Vision Transformers
Huo, Ying
Kang, Yongqiang
Yang, Dawei
Zhu, Jiahao
NEUROCOMPUTING, 2025, 637

← 1 →