A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking

被引:5
|
作者
Papa, Lorenzo [1 ,2 ]
Russo, Paolo [1 ]
Amerini, Irene [1 ]
Zhou, Luping [2 ]
机构
[1] Sapienza Univ Rome, Dept Comp Control & Management Engn, I-00185 Rome, Italy
[2] Univ Sydney, Sch Elect & Informat Engn, Fac Engn, Sydney, NSW 2006, Australia
关键词
Computer vision; computational efficiency; vision transformer;
D O I
10.1109/TPAMI.2024.3392941
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution. Generally speaking, it is challenging to employ these architectures in real-world applications due to many hardware and environmental restrictions, such as processing and computational capabilities. Therefore, this survey investigates the most efficient methodologies to ensure sub-optimal estimation performances. More in detail, four efficient categories will be analyzed: compact architecture, pruning, knowledge distillation, and quantization strategies. Moreover, a new metric called Efficient Error Rate has been introduced in order to normalize and compare models' features that affect hardware devices at inference time, such as the number of parameters, bits, FLOPs, and model size. Summarizing, this paper first mathematically defines the strategies used to make Vision Transformer efficient, describes and discusses state-of-the-art methodologies, and analyzes their performances over different application scenarios. Toward the end of this paper, we also discuss open challenges and promising research directions.
引用
收藏
页码:7682 / 7700
页数:19
相关论文
共 50 条
  • [1] Transformers in Vision: A Survey
    Khan, Salman
    Naseer, Muzammal
    Hayat, Munawar
    Zamir, Syed Waqas
    Khan, Fahad Shahbaz
    Shah, Mubarak
    ACM COMPUTING SURVEYS, 2022, 54 (10S)
  • [2] Efficient Transformers: A Survey
    Tay, Yi
    Dehghani, Mostafa
    Bahri, Dara
    Metzler, Donald
    ACM COMPUTING SURVEYS, 2023, 55 (06)
  • [3] Vision Transformers in Image Restoration: A Survey
    Ali, Anas M.
    Benjdira, Bilel
    Koubaa, Anis
    El-Shafai, Walid
    Khan, Zahid
    Boulila, Wadii
    SENSORS, 2023, 23 (05)
  • [4] Vision transformers for dense prediction: A survey
    Zuo, Shuangquan
    Xiao, Yun
    Chang, Xiaojun
    Wang, Xuanhong
    KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [5] A Comprehensive Survey of Transformers for Computer Vision
    Jamil, Sonain
    Piran, Md. Jalil
    Kwon, Oh-Jin
    DRONES, 2023, 7 (05)
  • [6] Patch Slimming for Efficient Vision Transformers
    Tang, Yehui
    Han, Kai
    Wang, Yunhe
    Xu, Chang
    Guo, Jianyuan
    Xu, Chao
    Tao, Dacheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12155 - 12164
  • [7] Efficient Vision Transformers with Partial Attention
    Vo, Xuan-Thuy
    Nguyen, Duy-Linh
    Priadana, Adri
    Jo, Kang-Hyun
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 298 - 317
  • [8] A Survey on Efficient Training of Transformers
    Zhuang, Bohan
    Liu, Jing
    Pan, Zizheng
    He, Haoyu
    Weng, Yuetian
    Shen, Chunhua
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6823 - 6831
  • [9] A survey of techniques for designing I/O-efficient algorithms
    Maheshwari, A
    Zeh, N
    ALGORITHMS FOR MEMORY HIERARCHIES: ADVANCED LECTURES, 2003, 2625 : 36 - 61
  • [10] Vision Transformers for Image Classification: A Comparative Survey
    Wang, Yaoli
    Deng, Yaojun
    Zheng, Yuanjin
    Chattopadhyay, Pratik
    Wang, Lipo
    TECHNOLOGIES, 2025, 13 (01)