A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking

被引:5
|
作者
Papa, Lorenzo [1 ,2 ]
Russo, Paolo [1 ]
Amerini, Irene [1 ]
Zhou, Luping [2 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy
[2] Univ Sydney, Sch Elect & Informat Engn, Fac Engn, Sydney, NSW 2006, Australia
关键词
Computer vision; computational efficiency; vision transformer;
D O I
10.1109/TPAMI.2024.3392941
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper first mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.
引用
收藏
页码:7682 / 7700
页数:19
相关论文
共 50 条
  • [21] BinaryViT: Toward Efficient and Accurate Binary Vision Transformers
    Xiao, Junrui
    Li, Zhikai
    Li, Jianquan
    Yang, Lianwei
    Gu, Qingyi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 195 - 206
  • [22] Automated Progressive Learning for Efficient Training of Vision Transformers
    Li, Changlin
    Zhuang, Bohan
    Wang, Guangrun
    Liang, Xiaodan
    Chang, Xiaojun
    Yang, Yi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12476 - 12486
  • [23] DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
    Chen, Mengzhao
    Shao, Wenqi
    Xu, Peng
    Lin, Mingbao
    Zhang, Kaipeng
    Chao, Fei
    Ji, Rongrong
    Qiao, Yu
    Luo, Ping
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17118 - 17128
  • [24] Parameter-Efficient Model Adaptation for Vision Transformers
    He, Xuehai
    Li, Chuanyuan
    Zhang, Pengchuan
    Yang, Jianwei
    Wang, Xin Eric
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 817 - 825
  • [25] A Survey on Stereo Vision Matching Algorithms
    Zhang, Xiaoxue
    Liu, Zhigang
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 2026 - 2031
  • [26] A Survey of Clustering Techniques and Algorithms
    Nisha
    Kaur, Puneet Jai
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 304 - 307
  • [27] Performance characteristics of vision algorithms
    Christensen, HI
    Forstner, W
    MACHINE VISION AND APPLICATIONS, 1997, 9 (5-6) : 215 - 218
  • [28] A survey of the vision transformers and their CNN-transformer based variants
    Khan, Asifullah
    Raufu, Zunaira
    Sohail, Anabia
    Khan, Abdul Rehman
    Asif, Hifsa
    Asif, Aqsa
    Farooq, Umair
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S2917 - S2970
  • [29] A survey of new techniques in insulation monitoring of power transformers
    Ward, BH
    IEEE ELECTRICAL INSULATION MAGAZINE, 2001, 17 (03) : 16 - 23
  • [30] A survey of the vision transformers and their CNN-transformer based variants
    Asifullah Khan
    Zunaira Rauf
    Anabia Sohail
    Abdul Rehman Khan
    Hifsa Asif
    Aqsa Asif
    Umair Farooq
    Artificial Intelligence Review, 2023, 56 : 2917 - 2970