A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking

被引：5

作者：

Papa, Lorenzo ^{[1
,2
]}

Russo, Paolo ^{[1
]}

Amerini, Irene ^{[1
]}

Zhou, Luping ^{[2
]}

机构：

[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy

[2] Univ Sydney, Sch Elect & Informat Engn, Fac Engn, Sydney, NSW 2006, Australia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

关键词：

Computer vision; computational efficiency; vision transformer;

D O I：

10.1109/TPAMI.2024.3392941

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper first mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.

引用

页码：7682 / 7700

页数：19

共 50 条

[21] BinaryViT: Toward Efficient and Accurate Binary Vision Transformers
Xiao, Junrui
Li, Zhikai
Li, Jianquan
Yang, Lianwei
Gu, Qingyi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 195 - 206
[22] Automated Progressive Learning for Efficient Training of Vision Transformers
Li, Changlin
Zhuang, Bohan
Wang, Guangrun
Liang, Xiaodan
Chang, Xiaojun
Yang, Yi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12476 - 12486
[23] DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Chen, Mengzhao
Shao, Wenqi
Xu, Peng
Lin, Mingbao
Zhang, Kaipeng
Chao, Fei
Ji, Rongrong
Qiao, Yu
Luo, Ping
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17118 - 17128
[24] Parameter-Efficient Model Adaptation for Vision Transformers
He, Xuehai
Li, Chuanyuan
Zhang, Pengchuan
Yang, Jianwei
Wang, Xin Eric
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 817 - 825
[25] A Survey on Stereo Vision Matching Algorithms
Zhang, Xiaoxue
Liu, Zhigang
2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 2026 - 2031
[26] A Survey of Clustering Techniques and Algorithms
Nisha
Kaur, Puneet Jai
2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 304 - 307
[27] Performance characteristics of vision algorithms
Christensen, HI
Forstner, W
MACHINE VISION AND APPLICATIONS, 1997, 9 (5-6) : 215 - 218
[28] A survey of the vision transformers and their CNN-transformer based variants
Khan, Asifullah
Raufu, Zunaira
Sohail, Anabia
Khan, Abdul Rehman
Asif, Hifsa
Asif, Aqsa
Farooq, Umair
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S2917 - S2970
[29] A survey of new techniques in insulation monitoring of power transformers
Ward, BH
IEEE ELECTRICAL INSULATION MAGAZINE, 2001, 17 (03) : 16 - 23
[30] A survey of the vision transformers and their CNN-transformer based variants
Asifullah Khan
Zunaira Rauf
Anabia Sohail
Abdul Rehman Khan
Hifsa Asif
Aqsa Asif
Umair Farooq
Artificial Intelligence Review, 2023, 56 : 2917 - 2970

← 1 2 3 4 5 →