Vision transformer models for mobile/edge devices: a survey

被引:3
|
作者
Lee, Seung Il [1 ]
Koo, Kwanghyun [1 ]
Lee, Jong Ho [1 ]
Lee, Gilha [1 ]
Jeong, Sangbeom [1 ]
Seongjun, O. [1 ]
Kim, Hyun [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Dept Elect & Informat Engn, 232 Gongneung Ro, Seoul 01811, South Korea
基金
新加坡国家研究基金会;
关键词
Vision transformer; Mobile/edge devices; Survey; NEURAL-NETWORK;
D O I
10.1007/s00530-024-01312-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapidly growing demand for high-performance deep learning vision models on mobile and edge devices, this paper emphasizes the importance of compact deep learning-based vision models that can provide high accuracy while maintaining a small model size. In particular, based on the success of transformer models in natural language processing and computer vision tasks, this paper offers a comprehensive examination of the latest research in redesigning the Vision Transformer (ViT) model into a compact architecture suitable for mobile/edge devices. The paper classifies compact ViT models into three major categories: (1) architecture and hierarchy restructuring, (2) encoder block enhancements, and (3) integrated approaches, and provides a detailed overview of each category. This paper also analyzes the contribution of each method to model performance and computational efficiency, providing a deeper understanding of how to efficiently implement ViT models on edge devices. As a result, this paper can offer new insights into the design and implementation of compact ViT models for researchers in this field and provide guidelines for optimizing the performance and improving the efficiency of deep learning vision models on edge devices.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Security Issues of Mobile Devices: A Survey
    Helm, Gary
    Chowdhury, Md Minhaz
    2021 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2021, : 14 - 20
  • [42] Energy Bugs in Mobile Devices: A Survey
    Demidem, Amine
    Elmiligi, Haytham
    Gebali, Fayez
    2015 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2015, : 513 - 517
  • [43] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [44] Poster: Profiling Event Vision Processing on Edge Devices
    Gokarn, Ila
    Misra, Archan
    PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024, 2024, : 672 - 673
  • [45] METER: A Mobile Vision Transformer Architecture for Monocular Depth Estimation
    Papa, Lorenzo
    Russo, Paolo
    Amerini, Irene
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5882 - 5893
  • [46] MixMobileNet: A Mixed Mobile Network for Edge Vision Applications
    Meng, Yanju
    Wu, Peng
    Feng, Jian
    Zhang, Xiaoming
    ELECTRONICS, 2024, 13 (03)
  • [47] An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices
    Lee, Juhyeon
    Bahk, Insung
    Kim, Hoseung
    Jeong, Sinjin
    Lee, Suyeon
    Min, Donghyun
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 50 - 61
  • [48] Efficient Image Captioning Based on Vision Transformer Models
    Elbedwehy, Samar
    Medhat, T.
    Hamza, Taher
    Alrahmawy, Mohammed F.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 1483 - 1500
  • [49] Deepfake Image Detection using Vision Transformer Models
    Ghita, Bogdan
    Kuzminykh, Ievgeniia
    Usama, Abubakar
    Bakhshi, Taimur
    Marchang, Jims
    2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 332 - 335
  • [50] Quantifying interpretation reproducibility in Vision Transformer models with TAVAC
    Zhao, Yue
    Agyemang, Dylan
    Liu, Yang
    Mahoney, Matt
    Li, Sheng
    SCIENCE ADVANCES, 2024, 10 (51):