Vision transformer models for mobile/edge devices: a survey

被引：3

作者：

Lee, Seung Il ^{[1
]}

Koo, Kwanghyun ^{[1
]}

Lee, Jong Ho ^{[1
]}

Lee, Gilha ^{[1
]}

Jeong, Sangbeom ^{[1
]}

Seongjun, O. ^{[1
]}

Kim, Hyun ^{[1
]}

机构：

[1] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Dept Elect & Informat Engn, 232 Gongneung Ro, Seoul 01811, South Korea

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Vision transformer; Mobile/edge devices; Survey; NEURAL-NETWORK;

D O I：

10.1007/s00530-024-01312-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapidly growing demand for high-performance deep learning vision models on mobile and edge devices, this paper emphasizes the importance of compact deep learning-based vision models that can provide high accuracy while maintaining a small model size. In particular, based on the success of transformer models in natural language processing and computer vision tasks, this paper offers a comprehensive examination of the latest research in redesigning the Vision Transformer (ViT) model into a compact architecture suitable for mobile/edge devices. The paper classifies compact ViT models into three major categories: (1) architecture and hierarchy restructuring, (2) encoder block enhancements, and (3) integrated approaches, and provides a detailed overview of each category. This paper also analyzes the contribution of each method to model performance and computational efficiency, providing a deeper understanding of how to efficiently implement ViT models on edge devices. As a result, this paper can offer new insights into the design and implementation of compact ViT models for researchers in this field and provide guidelines for optimizing the performance and improving the efficiency of deep learning vision models on edge devices.

引用

页数：18

共 50 条

[41] Security Issues of Mobile Devices: A Survey
Helm, Gary
Chowdhury, Md Minhaz
2021 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2021, : 14 - 20
[42] Energy Bugs in Mobile Devices: A Survey
Demidem, Amine
Elmiligi, Haytham
Gebali, Fayez
2015 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2015, : 513 - 517
[43] Vision-Language Models for Vision Tasks: A Survey
Zhang, Jingyi
Huang, Jiaxing
Jin, Sheng
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
[44] Poster: Profiling Event Vision Processing on Edge Devices
Gokarn, Ila
Misra, Archan
PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024, 2024, : 672 - 673
[45] METER: A Mobile Vision Transformer Architecture for Monocular Depth Estimation
Papa, Lorenzo
Russo, Paolo
Amerini, Irene
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5882 - 5893
[46] MixMobileNet: A Mixed Mobile Network for Edge Vision Applications
Meng, Yanju
Wu, Peng
Feng, Jian
Zhang, Xiaoming
ELECTRONICS, 2024, 13 (03)
[47] An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices
Lee, Juhyeon
Bahk, Insung
Kim, Hoseung
Jeong, Sinjin
Lee, Suyeon
Min, Donghyun
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 50 - 61
[48] Efficient Image Captioning Based on Vision Transformer Models
Elbedwehy, Samar
Medhat, T.
Hamza, Taher
Alrahmawy, Mohammed F.
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 1483 - 1500
[49] Deepfake Image Detection using Vision Transformer Models
Ghita, Bogdan
Kuzminykh, Ievgeniia
Usama, Abubakar
Bakhshi, Taimur
Marchang, Jims
2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 332 - 335
[50] Quantifying interpretation reproducibility in Vision Transformer models with TAVAC
Zhao, Yue
Agyemang, Dylan
Liu, Yang
Mahoney, Matt
Li, Sheng
SCIENCE ADVANCES, 2024, 10 (51):

← 1 2 3 4 5 →