A Comprehensive Survey of Transformers for Computer Vision

被引:32
|
作者
Jamil, Sonain [1 ]
Piran, Md. Jalil [2 ]
Kwon, Oh-Jin [1 ]
机构
[1] Sejong Univ, Dept Elect Engn, Seoul 05006, South Korea
[2] Sejong Univ, Dept Comp Engn, Seoul 05006, South Korea
关键词
vision transformers; computer vision; deep learning; image coding; drone imagery; drone surveillance; ANOMALY DETECTION; CLASSIFICATION; IMAGES; CNN;
D O I
10.3390/drones7050287
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
As a special type of transformer, vision transformers (ViTs) can be used for various computer vision (CV) applications. Convolutional neural networks (CNNs) have several potential problems that can be resolved with ViTs. For image coding tasks such as compression, super-resolution, segmentation, and denoising, different variants of ViTs are used. In our survey, we determined the many CV applications to which ViTs are applicable. CV applications reviewed included image classification, object detection, image segmentation, image compression, image super-resolution, image denoising, anomaly detection, and drone imagery. We reviewed the state of the-art and compiled a list of available models and discussed the pros and cons of each model.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Attention mechanisms in computer vision: A survey
    Meng-Hao Guo
    Tian-Xing Xu
    Jiang-Jiang Liu
    Zheng-Ning Liu
    Peng-Tao Jiang
    Tai-Jiang Mu
    Song-Hai Zhang
    Ralph R.Martin
    Ming-Ming Cheng
    Shi-Min Hu
    Computational Visual Media, 2022, 8 (03) : 331 - 368
  • [32] Attention mechanisms in computer vision: A survey
    Meng-Hao Guo
    Tian-Xing Xu
    Jiang-Jiang Liu
    Zheng-Ning Liu
    Peng-Tao Jiang
    Tai-Jiang Mu
    Song-Hai Zhang
    Ralph R. Martin
    Ming-Ming Cheng
    Shi-Min Hu
    Computational Visual Media, 2022, 8 : 331 - 368
  • [33] Survey of Transformer Research in Computer Vision
    Li, Xiang
    Zhang, Tao
    Zhang, Zhe
    Wei, Hongyang
    Qian, Yurong
    Computer Engineering and Applications, 2023, 59 (01) : 1 - 14
  • [34] A Survey On Graph Matching In Computer Vision
    Sun, Hui
    Zhou, Wenju
    Fei, Minrui
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 225 - 230
  • [35] Geotagging in multimedia and computer vision—a survey
    Jiebo Luo
    Dhiraj Joshi
    Jie Yu
    Andrew Gallagher
    Multimedia Tools and Applications, 2011, 51 : 187 - 211
  • [36] Context understanding in computer vision: A survey
    Wang, Xuan
    Zhu, Zhigang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229
  • [37] Attention mechanisms in computer vision: A survey
    Guo, Meng-Hao
    Xu, Tian-Xing
    Liu, Jiang-Jiang
    Liu, Zheng-Ning
    Jiang, Peng-Tao
    Mu, Tai-Jiang
    Zhang, Song-Hai
    Martin, Ralph R.
    Cheng, Ming-Ming
    Hu, Shi-Min
    COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) : 331 - 368
  • [38] A Historical Survey of Geometric Computer Vision
    Sturm, Peter
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS: 14TH INTERNATIONAL CONFERENCE, CAIP 2011, PT I, 2011, 6854 : 1 - 8
  • [39] Fashion Meets Computer Vision: A Survey
    Cheng, Wen-Huang
    Song, Sijie
    Chen, Chieh-Yun
    Hidayati, Shintami Chusnul
    Liu, Jiaying
    ACM COMPUTING SURVEYS, 2021, 54 (04)
  • [40] A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking
    Papa, Lorenzo
    Russo, Paolo
    Amerini, Irene
    Zhou, Luping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7682 - 7700