Vision transformer models for mobile/edge devices: a survey

被引：3

作者：

Lee, Seung Il ^{[1
]}

Koo, Kwanghyun ^{[1
]}

Lee, Jong Ho ^{[1
]}

Lee, Gilha ^{[1
]}

Jeong, Sangbeom ^{[1
]}

Seongjun, O. ^{[1
]}

Kim, Hyun ^{[1
]}

机构：

[1] Seoul Natl Univ Sci & Technol, Res Ctr Elect & Informat Technol, Dept Elect & Informat Engn, 232 Gongneung Ro, Seoul 01811, South Korea

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 02期

基金：

新加坡国家研究基金会;

关键词：

Vision transformer; Mobile/edge devices; Survey; NEURAL-NETWORK;

D O I：

10.1007/s00530-024-01312-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the rapidly growing demand for high-performance deep learning vision models on mobile and edge devices, this paper emphasizes the importance of compact deep learning-based vision models that can provide high accuracy while maintaining a small model size. In particular, based on the success of transformer models in natural language processing and computer vision tasks, this paper offers a comprehensive examination of the latest research in redesigning the Vision Transformer (ViT) model into a compact architecture suitable for mobile/edge devices. The paper classifies compact ViT models into three major categories: (1) architecture and hierarchy restructuring, (2) encoder block enhancements, and (3) integrated approaches, and provides a detailed overview of each category. This paper also analyzes the contribution of each method to model performance and computational efficiency, providing a deeper understanding of how to efficiently implement ViT models on edge devices. As a result, this paper can offer new insights into the design and implementation of compact ViT models for researchers in this field and provide guidelines for optimizing the performance and improving the efficiency of deep learning vision models on edge devices.

引用

页数：18

共 50 条

[11] Placement of DNN Models on Mobile Edge Devices for Effective Video Analysis
Constantinou, George
Shahabi, Cyrus
Kim, Seon Ho
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 207 - 218
[12] Edge Caching for Mobile Devices
Azeem, Muhammad Rameez
Muzammal, Syeda Mariam
Zaman, Noor
Khan, Muhammad Asghar
2022 14TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS), 2022,
[13] Survey of Transformer Research in Computer Vision
Li, Xiang
Zhang, Tao
Zhang, Zhe
Wei, Hongyang
Qian, Yurong
Computer Engineering and Applications, 2023, 59 (01) : 1 - 14
[14] Middleware for Edge Devices in Mobile Edge Computing
Pandey, Manish
Kwon, Young-Woo
2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
[15] Optimising TinyML with quantization and distillation of transformer and mamba models for indoor localisation on edge devices
Suwannaphong, Thanaphon
Jovan, Ferdian
Craddock, Ian
Mcconville, Ryan
SCIENTIFIC REPORTS, 2025, 15 (01):
[16] Survey of Vision Transformer in Low-Level Computer Vision
Zhu, Kai
Li, Li
Zhang, Tong
Jiang, Sheng
Bie, Yiming
Computer Engineering and Applications, 2024, 60 (04) : 39 - 56
[17] Mobile Edge Computing: A Survey
Abbas, Nasir
Zhang, Yan
Taherkordi, Amir
Skeie, Tor
IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (01): : 450 - 465
[18] A Survey on Mobile Edge Computing
Ahmed, Arif
Ahmed, Ejaz
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
[19] ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Tang, Chen
Zhang, Li Lyna
Jiang, Huiqiang
Xu, Jiahang
Cao, Ting
Zhang, Quanlu
Yang, Yuqing
Wang, Zhi
Yang, Mao
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5806 - 5817
[20] ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Tang, Chen
Zhang, Li Lyna
Jiang, Huiqiang
Xu, Jiahang
Cao, Ting
Zhang, Quanlu
Yang, Yuqing
Wang, Zhi
Yang, Mao
arXiv, 2023,

← 1 2 3 4 5 →