Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

被引:49
|
作者
Moutik, Oumaima [1 ]
Sekkat, Hiba [1 ]
Tigani, Smail [1 ]
Chehri, Abdellah [2 ]
Saadane, Rachid [3 ]
Tchakoucht, Taha Ait [1 ]
Paul, Anand [4 ]
机构
[1] Euro Mediterranean Univ, Euromed Res Ctr, Engn Unit, Fes 30030, Morocco
[2] Royal Mil Coll Canada, Dept Math & Comp Sci, Kingston, ON K7K 7B4, Canada
[3] Hassania Sch Publ Works, SIRC LaGeS, Casablanca 8108, Morocco
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu 41566, South Korea
关键词
convolutional neural networks; vision transformers; recurrent neural networks; conversational systems; action recognition; natural language understanding; action recognitions; COMPUTER VISION; ATTENTION;
D O I
10.3390/s23020734
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race will be discussed.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Skin Lesion Segmentation Based on Vision Transformers and Convolutional Neural Networks-A Comparative Study
    Gulzar, Yonis
    Khan, Sumeer Ahmad
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [22] Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy
    Goh, Jocelyn Hui Lin
    Ang, Elroy
    Srinivasan, Sahana
    Lei, Xiaofeng
    Loh, Johnathan
    Quek, Ten Cheer
    Xue, Cancan
    Xu, Xinxing
    Liu, Yong
    Cheng, Ching-Yu
    Rajapakse, Jagath C.
    Tham, Yih-Chung
    OPHTHALMOLOGY SCIENCE, 2024, 4 (06):
  • [23] Convolutional Neural Networks for 3D Vision System Data
    O'Mahony, Niall
    Campbell, Sean
    Krpalkova, Lenka
    Carvalho, Anderson
    Velasco-Hernandez, Gustavo Adolfo
    Riordan, Daniel
    Walsh, Joseph
    2018 12TH INTERNATIONAL CONFERENCE ON SENSING TECHNOLOGY (ICST), 2018, : 160 - 165
  • [24] Synthetic Training Data Generation for Convolutional Neural Networks in Vision Applications
    Vietz, Hannes
    Rauch, Tristan
    Weyrich, Michael
    2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [25] Novel applications of Convolutional Neural Networks in the age of Transformers
    Ersavas, Tansel
    Smith, Martin A.
    Mattick, John S.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Ensembles of Convolutional Neural Networks and Transformers for Polyp Segmentation
    Nanni, Loris
    Fantozzi, Carlo
    Loreggia, Andrea
    Lumini, Alessandra
    SENSORS, 2023, 23 (10)
  • [27] Brain Tumor Diagnosis Using Machine Learning, Convolutional Neural Networks, Capsule Neural Networks and Vision Transformers, Applied to MRI: A Survey
    Akinyelu, Andronicus A.
    Zaccagna, Fulvio
    Grist, James T.
    Castelli, Mauro
    Rundo, Leonardo
    JOURNAL OF IMAGING, 2022, 8 (08)
  • [28] Data-Driven Deep Convolutional Neural Networks for Electromagnetic Field Estimation of Transformers
    Chen, Yifan
    Yang, Qingxin
    Li, Yongjian
    Zhang, Hao
    Zhang, Changgeng
    IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2024, 34 (08) : 1 - 4
  • [29] Convolutional Neural Networks and Vision Transformers in Product GS1 GPC Brick Code Recognition
    Szymkowski, Maciej
    Niemir, Maciej
    Mrugalska, Beata
    Saeed, Khalid
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 440 - 450
  • [30] Diabetic Foot Ulcers Detection Model Using a Hybrid Convolutional Neural Networks-Vision Transformers
    Sait, Abdul Rahaman Wahab
    Nagaraj, Ramprasad
    DIAGNOSTICS, 2025, 15 (06)