From CNNs to Transformers in Multimodal Human Action Recognition: A Survey

被引:2
|
作者
Shaikh, Muhammad bilal [1 ,2 ]
Chai, Douglas [2 ]
Islam, Syed Muhammad Shamsul [3 ]
Akhtar, Naveed [4 ]
机构
[1] Edith Cowan Univ, Sch Engn, Joondalup, WA, Australia
[2] Molycop, Balcatta, WA, Australia
[3] Edith Cowan Univ, Sch Sci, Syed Muhammad Shamsul Islam, Joondalup, WA, Australia
[4] Univ Melbourne, Melbourne, Vic, Australia
关键词
Multimodal; action recognition; fusion; deep learning; neural networks; RGB-D; FUSION; STREAMS;
D O I
10.1145/3664815
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to its widespread applications, human action recognition is one of the most widely studied research problems in Computer Vision. Recent studies have shown that addressing it using multimodal data leads to superior performance as compared to relying on a single data modality. During the adoption of deep learning for visual modelling in the past decade, action recognition approaches have mainly relied on Convolutional Neural Networks (CNNs). However, the recent rise of Transformers in visual modelling is now also causing a paradigm shift for the action recognition task. This survey captures this transition while focusing on Multimodal Human Action Recognition (MHAR). Unique to the induction of multimodal computational models is the process of 'fusing' the features of the individual data modalities. Hence, we specifically focus on the fusion design aspects of the MHAR approaches. We analyze the classic and emerging techniques in this regard, while also highlighting the popular trends in the adaption of CNN and Transformer building blocks for the overall problem. In particular, we emphasize on recent design choices that have led to more efficient MHAR models. Unlike existing reviews, which discuss Human Action Recognition from a broad perspective, this survey is specifically aimed at pushing the boundaries of MHAR research by identifying promising architectural and fusion design choices to train practicable models. We also provide an outlook of the multimodal datasets from their scale and evaluation viewpoint. Finally, building on the reviewed literature, we discuss the challenges and future avenues for MHAR.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Multimodal human action recognition based on spatio-temporal action representation recognition model
    Wu, Qianhan
    Huang, Qian
    Li, Xing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16409 - 16430
  • [22] Multimodal human action recognition based on spatio-temporal action representation recognition model
    Qianhan Wu
    Qian Huang
    Xing Li
    Multimedia Tools and Applications, 2023, 82 : 16409 - 16430
  • [23] MULTIMODAL HUMAN ACTION RECOGNITION IN ASSISTIVE HUMAN-ROBOT INTERACTION
    Rodomagoulakis, I.
    Kardaris, N.
    Pitsikalis, V.
    Mavroudi, E.
    Katsamanis, A.
    Tsiami, A.
    Maragos, P.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2702 - 2706
  • [24] A Temporal Order Modeling Approach to Human Action Recognition from Multimodal Sensor Data
    Ye, Jun
    Hu, Hao
    Qi, Guo-Jun
    Hua, Kien A.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2017, 13 (02) : 1 - 22
  • [25] Localization and recognition of human action in 3D using transformers
    Jiankai Sun
    Linjiang Huang
    Hongsong Wang
    Chuanyang Zheng
    Jianing Qiu
    Md Tauhidul Islam
    Enze Xie
    Bolei Zhou
    Lei Xing
    Arjun Chandrasekaran
    Michael J. Black
    Communications Engineering, 3 (1):
  • [26] A survey on intelligent human action recognition techniques
    Kumar, Rahul
    Kumar, Shailender
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 52653 - 52709
  • [27] A survey on intelligent human action recognition techniques
    Rahul Kumar
    Shailender Kumar
    Multimedia Tools and Applications, 2024, 83 : 52653 - 52709
  • [28] Advances in human action recognition: an updated survey
    Abu-Bakar, Syed A. R.
    IET IMAGE PROCESSING, 2019, 13 (13) : 2381 - 2394
  • [29] A Survey of Human Action Recognition and Posture Prediction
    Nan Ma
    Zhixuan Wu
    Yiu-ming Cheung
    Yuchen Guo
    Yue Gao
    Jiahong Li
    Beiyan Jiang
    TsinghuaScienceandTechnology, 2022, 27 (06) : 973 - 1001
  • [30] A Survey of Human Action Recognition and Posture Prediction
    Ma, Nan
    Wu, Zhixuan
    Cheung, Yiu-ming
    Guo, Yuchen
    Gao, Yue
    Li, Jiahong
    Jiang, Beijyan
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (06) : 973 - 1001