Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?

被引:49
|
作者
Moutik, Oumaima [1 ]
Sekkat, Hiba [1 ]
Tigani, Smail [1 ]
Chehri, Abdellah [2 ]
Saadane, Rachid [3 ]
Tchakoucht, Taha Ait [1 ]
Paul, Anand [4 ]
机构
[1] Euro Mediterranean Univ, Euromed Res Ctr, Engn Unit, Fes 30030, Morocco
[2] Royal Mil Coll Canada, Dept Math & Comp Sci, Kingston, ON K7K 7B4, Canada
[3] Hassania Sch Publ Works, SIRC LaGeS, Casablanca 8108, Morocco
[4] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu 41566, South Korea
关键词
convolutional neural networks; vision transformers; recurrent neural networks; conversational systems; action recognition; natural language understanding; action recognitions; COMPUTER VISION; ATTENTION;
D O I
10.3390/s23020734
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Understanding actions in videos remains a significant challenge in computer vision, which has been the subject of several pieces of research in the last decades. Convolutional neural networks (CNN) are a significant component of this topic and play a crucial role in the renown of Deep Learning. Inspired by the human vision system, CNN has been applied to visual data exploitation and has solved various challenges in various computer vision tasks and video/image analysis, including action recognition (AR). However, not long ago, along with the achievement of the transformer in natural language processing (NLP), it began to set new trends in vision tasks, which has created a discussion around whether the Vision Transformer models (ViT) will replace CNN in action recognition in video clips. This paper conducts this trending topic in detail, the study of CNN and Transformer for Action Recognition separately and a comparative study of the accuracy-complexity trade-off. Finally, based on the performance analysis's outcome, the question of whether CNN or Vision Transformers will win the race will be discussed.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Froth image based monitoring of platinum group metals flotation with vision transformers and convolutional neural networks
    Liu, Xiu
    Aldrich, Chris
    MINERALS ENGINEERING, 2024, 215
  • [32] CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation
    Lei, Tao
    Sun, Rui
    Wang, Xuan
    Wang, Yingbo
    He, Xi
    Nandi, Asoke
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1017 - 1025
  • [33] Deep Learning for Active Region Classification: A Systematic Study from Convolutional Neural Networks to Vision Transformers
    Legnaro, Edoardo
    Guastavino, Sabrina
    Piana, Michele
    Massone, Anna Maria
    ASTROPHYSICAL JOURNAL, 2025, 981 (02):
  • [34] Optimizing Strawberry Disease and Quality Detection with Vision Transformers and Attention-Based Convolutional Neural Networks
    Aghamohammadesmaeilketabforoosh, Kimia
    Nikan, Soodeh
    Antonini, Giorgio
    Pearce, Joshua M.
    FOODS, 2024, 13 (12)
  • [35] On the Correspondence between Human Vision and Convolutional Neural Networks: A Visual Quality Assessment Perspective
    Mahmoudpour, Saeed
    Schelkens, Peter
    2023 15TH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE, QOMEX, 2023, : 153 - 158
  • [36] Convolutional Neural Networks Implementations for Computer Vision
    Michalski, Pawel
    Ruszczak, Bogdan
    Tomaszewski, Michal
    BIOMEDICAL ENGINEERING AND NEUROSCIENCE, 2018, 720 : 98 - 110
  • [37] Active Vision in the Era of Convolutional Neural Networks
    Gallos, Dimitri
    Ferrie, Frank
    2019 16TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2019), 2019, : 81 - 88
  • [38] A review of convolutional neural networks in computer vision
    Xia Zhao
    Limin Wang
    Yufei Zhang
    Xuming Han
    Muhammet Deveci
    Milan Parmar
    Artificial Intelligence Review, 57
  • [39] A review of convolutional neural networks in computer vision
    Zhao, Xia
    Wang, Limin
    Zhang, Yufei
    Han, Xuming
    Deveci, Muhammet
    Parmar, Milan
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [40] ECVNet: A Fusion Network of Efficient Convolutional Neural Networks and Visual Transformers for Tomato Leaf Disease Identification
    Zou, Fendong
    Hua, Jing
    Zhu, Yuanhao
    Deng, Jize
    He, Ruimin
    AGRONOMY-BASEL, 2024, 14 (12):