Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

被引:49
|
作者
Moutik, Oumaima [1 ]
Sekkat, Hiba [1 ]
Tigani, Smail [1 ]
Chehri, Abdellah [2 ]
Saadane, Rachid [3 ]
Tchakoucht, Taha Ait [1 ]
Paul, Anand [4 ]
机构
[1] Euro Mediterranean Univ, Euromed Res Ctr, Engn Unit, Fes 30030, Morocco
[2] Royal Mil Coll Canada, Dept Math & Comp Sci, Kingston, ON K7K 7B4, Canada
[3] Hassania Sch Publ Works, SIRC LaGeS, Casablanca 8108, Morocco
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu 41566, South Korea
关键词
convolutional neural networks; vision transformers; recurrent neural networks; conversational systems; action recognition; natural language understanding; action recognitions; COMPUTER VISION; ATTENTION;
D O I
10.3390/s23020734
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race will be discussed.
引用
收藏
页数:21
相关论文
共 50 条
  • [11] Robustness and Explainability of Visual Transformers and Convolutional Neural Networks in Glioma Radiogenomics Tasks
    Takahashi, Satoshi
    Takahashi, Masamichi
    Kinoshita, Manabu
    Miyake, Mototaka
    Kobayashi, Kazuma
    Sese, Jun
    Ichimura, Koichi
    Narita, Yoshitaka
    Hamamoto, Ryuji
    CANCER SCIENCE, 2024, 115 : 1544 - 1544
  • [12] Automatic Microstructural Classification of Ultrahigh Carbon Steel with Vision Transformers and Convolutional Neural Networks
    Liu, Xiu
    Aldrich, Chris
    IFAC PAPERSONLINE, 2024, 58 (22): : 119 - 123
  • [13] Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review
    Takahashi, Satoshi
    Sakaguchi, Yusuke
    Kouno, Nobuji
    Takasawa, Ken
    Ishizu, Kenichi
    Akagi, Yu
    Aoyama, Rina
    Teraya, Naoki
    Bolatkan, Amina
    Shinkai, Norio
    Machino, Hidenori
    Kobayashi, Kazuma
    Asada, Ken
    Komatsu, Masaaki
    Kaneko, Syuzo
    Sugiyama, Masashi
    Hamamoto, Ryuji
    JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
  • [14] A comparative study of vision transformers and convolutional neural networks: sugarcane leaf diseases identification
    Süleyman Öğrekçi
    Yavuz Ünal
    Muhammet Nuri Dudak
    European Food Research and Technology, 2023, 249 : 1833 - 1843
  • [15] Comprehensive comparison between vision transformers and convolutional neural networks for face recognition tasks
    Rodrigo, Marcos
    Cuevas, Carlos
    Garcia, Narciso
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [16] A comparative study of vision transformers and convolutional neural networks: sugarcane leaf diseases identification
    Ogrekci, Suleyman
    Unal, Yavuz
    Dudak, Muhammet Nuri
    EUROPEAN FOOD RESEARCH AND TECHNOLOGY, 2023, 249 (07) : 1833 - 1843
  • [17] Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks
    Singh, Naseeb
    Tewari, V. K.
    Biswas, P. K.
    INDUSTRIAL CROPS AND PRODUCTS, 2025, 223
  • [18] Utilizing convolutional neural networks and vision transformers for precise corn leaf disease identification
    Ishak Pacal
    Gültekin Işık
    Neural Computing and Applications, 2025, 37 (4) : 2479 - 2496
  • [19] A Comparative Evaluation between Convolutional Neural Networks and Vision Transformers for COVID-19 Detection
    Nafisah, Saad I.
    Muhammad, Ghulam
    Hossain, M. Shamim
    AlQahtani, Salman A.
    MATHEMATICS, 2023, 11 (06)
  • [20] Deep Learning Techniques for Colorectal Cancer Detection: Convolutional Neural Networks vs Vision Transformers
    Sari, Meriem
    Moussaoui, Abdelouahab
    Hadid, Abdennour
    PROGRAM OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND AUTOMATIC CONTROL, ICEEAC 2024, 2024,