Vision transformer models for mobile/edge devices: a survey

被引:3
|
作者
Lee, Seung Il [1 ]
Koo, Kwanghyun [1 ]
Lee, Jong Ho [1 ]
Lee, Gilha [1 ]
Jeong, Sangbeom [1 ]
Seongjun, O. [1 ]
Kim, Hyun [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Dept Elect & Informat Engn, 232 Gongneung Ro, Seoul 01811, South Korea
基金
新加坡国家研究基金会;
关键词
Vision transformer; Mobile/edge devices; Survey; NEURAL-NETWORK;
D O I
10.1007/s00530-024-01312-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapidly growing demand for high-performance deep learning vision models on mobile and edge devices, this paper emphasizes the importance of compact deep learning-based vision models that can provide high accuracy while maintaining a small model size. In particular, based on the success of transformer models in natural language processing and computer vision tasks, this paper offers a comprehensive examination of the latest research in redesigning the Vision Transformer (ViT) model into a compact architecture suitable for mobile/edge devices. The paper classifies compact ViT models into three major categories: (1) architecture and hierarchy restructuring, (2) encoder block enhancements, and (3) integrated approaches, and provides a detailed overview of each category. This paper also analyzes the contribution of each method to model performance and computational efficiency, providing a deeper understanding of how to efficiently implement ViT models on edge devices. As a result, this paper can offer new insights into the design and implementation of compact ViT models for researchers in this field and provide guidelines for optimizing the performance and improving the efficiency of deep learning vision models on edge devices.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Vision transformer models for mobile/edge devices: a survey
    Seung Il Lee
    Kwanghyun Koo
    Jong Ho Lee
    Gilha Lee
    Sangbeom Jeong
    Seongjun O
    Hyun Kim
    Multimedia Systems, 2024, 30
  • [2] PMVT: a lightweight vision transformer for plant disease identification on mobile devices
    Li, Guoqiang
    Wang, Yuchao
    Zhao, Qing
    Yuan, Peiyan
    Chang, Baofang
    FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [3] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [4] Towards Efficient Vision Transformer Inference: A First Study of Transformers on Mobile Devices
    Wang, Xudong
    Zhang, Li Lyna
    Wang, Yang
    Yang, Mao
    PROCEEDINGS OF THE 2022 THE 23RD ANNUAL INTERNATIONAL WORKSHOP ON MOBILE COMPUTING SYSTEMS AND APPLICATIONS (HOTMOBILE '22), 2022, : 1 - 7
  • [5] eViTBins: Edge-Enhanced Vision-Transformer Bins for Monocular Depth Estimation on Edge Devices
    She, Yutong
    Li, Peng
    Wei, Mingqiang
    Liang, Dong
    Chen, Yiping
    Xie, Haoran
    Wang, Fu Lee
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 20320 - 20334
  • [6] Neurobiological and Neurocognitive Models of Vision for Touch Input on Mobile Devices
    Schipor, Maria Doina
    Vatavu, Radu-Daniel
    2017 IEEE INTERNATIONAL CONFERENCE ON E-HEALTH AND BIOENGINEERING CONFERENCE (EHB), 2017, : 353 - 356
  • [7] ViT4Mal: Lightweight Vision Transformer for Malware Detection on Edge Devices
    Ravi, Akshara
    Chaturvedi, Vivek
    Shafique, Muhammad
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (05)
  • [8] FactionFormer: Context-Driven Collaborative Vision Transformer Models for Edge Intelligence
    Nimi, Sumaiya Tabassum
    Arefeen, Md Adnan
    Uddin, Md Yusuf Sarwar
    Debnath, Biplob
    Chakradhar, Srimat
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 349 - 354
  • [9] Augmenting computing capabilities at the edge by jointly exploiting mobile devices: A survey
    Hirsch, Matias
    Mateos, Cristian
    Zunino, Alejandro
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 644 - 662
  • [10] Retraining-free Constraint-aware Token Pruning for Vision Transformer on Edge Devices
    Yu, Yun-Chia
    Weng, Mao-Chi
    Lin, Ming-Guang
    Wu, An-Yeu
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,