Multimodal vision-based human action recognition using deep learning: a review

被引:3
|
作者
Shafizadegan, Fatemeh [1 ]
Naghsh-Nilchi, Ahmad R. [1 ]
Shabaninia, Elham [2 ]
机构
[1] Univ Isfahan, Fac Comp Engn, Dept Artificial Intelligence, Esfahan, Iran
[2] Grad Univ Adv Technol, Fac Sci & Modern Technol, Dept Appl Math, Kerman, Iran
关键词
Deep learning; Human action recognition; Multimodality; Visual modality; CONVOLUTIONAL NEURAL-NETWORKS; ORIENTED PRINCIPAL COMPONENTS; HAND GESTURE RECOGNITION; COMBINING CNN STREAMS; RGB-D; REAL-TIME; DEPTH; DATASET; MULTIVIEW; VIDEOS;
D O I
10.1007/s10462-024-10730-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.
引用
收藏
页数:85
相关论文
共 50 条
  • [1] A Review on Computer Vision-Based Methods for Human Action Recognition
    Al-Faris, Mahmoud
    Chiverton, John
    Ndzi, David
    Ahmed, Ahmed Isam
    JOURNAL OF IMAGING, 2020, 6 (06)
  • [2] An Extensive Analysis of the Vision-based Deep Learning Techniques for Action Recognition
    Manasa, R.
    Shukla, Ritika
    Saranya, K. C.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 604 - 611
  • [3] Literature review of vision-based dynamic gesture recognition using deep learning techniques
    Jain, Rahul
    Karsh, Ram Kumar
    Barbhuiya, Abul Abbas
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (22):
  • [4] Vision-based human fall detection systems using deep learning: A review
    Alam, Ekram
    Sufian, Abu
    Dutta, Paramartha
    Leo, Marco
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [5] A survey on vision-based human action recognition
    Poppe, Ronald
    IMAGE AND VISION COMPUTING, 2010, 28 (06) : 976 - 990
  • [6] A Review on Human Activity Recognition Using Vision-Based Method
    Zhang, Shugang
    Wei, Zhiqiang
    Nie, Jie
    Huang, Lei
    Wang, Shuang
    Li, Zhen
    JOURNAL OF HEALTHCARE ENGINEERING, 2017, 2017
  • [7] Episodic Reasoning for Vision-Based Human Action Recognition
    Santofimia, Maria J.
    Martinez-del-Rincon, Jesus
    Nebel, Jean-Christophe
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [8] An Overview of the Vision-Based Human Action Recognition Field
    Camarena, Fernando
    Gonzalez-Mendoza, Miguel
    Chang, Leonardo
    Cuevas-Ascencio, Ricardo
    MATHEMATICAL AND COMPUTATIONAL APPLICATIONS, 2023, 28 (02)
  • [9] Human Action Recognition using Computer Vision and Deep Learning Techniques
    Ganta, Suresh
    Desu, Devi Sri
    Golla, Aishwarya
    Kumar, M. Ashok
    2023 ADVANCED COMPUTING AND COMMUNICATION TECHNOLOGIES FOR HIGH PERFORMANCE APPLICATIONS, ACCTHPA, 2023,
  • [10] Deep learning in vision-based static hand gesture recognition
    Oyebade K. Oyedotun
    Adnan Khashman
    Neural Computing and Applications, 2017, 28 : 3941 - 3951