Multimodal vision-based human action recognition using deep learning: a review

被引：3

作者：

Shafizadegan, Fatemeh ^{[1
]}

Naghsh-Nilchi, Ahmad R. ^{[1
]}

Shabaninia, Elham ^{[2
]}

机构：

[1] Univ Isfahan, Fac Comp Engn, Dept Artificial Intelligence, Esfahan, Iran

[2] Grad Univ Adv Technol, Fac Sci & Modern Technol, Dept Appl Math, Kerman, Iran

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2024年 / 57卷 / 07期

关键词：

Deep learning; Human action recognition; Multimodality; Visual modality; CONVOLUTIONAL NEURAL-NETWORKS; ORIENTED PRINCIPAL COMPONENTS; HAND GESTURE RECOGNITION; COMBINING CNN STREAMS; RGB-D; REAL-TIME; DEPTH; DATASET; MULTIVIEW; VIDEOS;

D O I：

10.1007/s10462-024-10730-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.

引用

页数：85

共 50 条

[1] A Review on Computer Vision-Based Methods for Human Action Recognition
Al-Faris, Mahmoud
Chiverton, John
Ndzi, David
Ahmed, Ahmed Isam
JOURNAL OF IMAGING, 2020, 6 (06)
[2] An Extensive Analysis of the Vision-based Deep Learning Techniques for Action Recognition
Manasa, R.
Shukla, Ritika
Saranya, K. C.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 604 - 611
[3] Literature review of vision-based dynamic gesture recognition using deep learning techniques
Jain, Rahul
Karsh, Ram Kumar
Barbhuiya, Abul Abbas
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (22):
[4] Vision-based human fall detection systems using deep learning: A review
Alam, Ekram
Sufian, Abu
Dutta, Paramartha
Leo, Marco
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
[5] A survey on vision-based human action recognition
Poppe, Ronald
IMAGE AND VISION COMPUTING, 2010, 28 (06) : 976 - 990
[6] A Review on Human Activity Recognition Using Vision-Based Method
Zhang, Shugang
Wei, Zhiqiang
Nie, Jie
Huang, Lei
Wang, Shuang
Li, Zhen
JOURNAL OF HEALTHCARE ENGINEERING, 2017, 2017
[7] Episodic Reasoning for Vision-Based Human Action Recognition
Santofimia, Maria J.
Martinez-del-Rincon, Jesus
Nebel, Jean-Christophe
SCIENTIFIC WORLD JOURNAL, 2014,
[8] An Overview of the Vision-Based Human Action Recognition Field
Camarena, Fernando
Gonzalez-Mendoza, Miguel
Chang, Leonardo
Cuevas-Ascencio, Ricardo
MATHEMATICAL AND COMPUTATIONAL APPLICATIONS, 2023, 28 (02)
[9] Human Action Recognition using Computer Vision and Deep Learning Techniques
Ganta, Suresh
Desu, Devi Sri
Golla, Aishwarya
Kumar, M. Ashok
2023 ADVANCED COMPUTING AND COMMUNICATION TECHNOLOGIES FOR HIGH PERFORMANCE APPLICATIONS, ACCTHPA, 2023,
[10] Deep learning in vision-based static hand gesture recognition
Oyebade K. Oyedotun
Adnan Khashman
Neural Computing and Applications, 2017, 28 : 3941 - 3951

← 1 2 3 4 5 →