Multimodal action recognition: a comprehensive survey on temporal modeling

被引:1
|
作者
Shabaninia, Elham [1 ,2 ]
Nezamabadi-pour, Hossein [2 ]
Shafizadegan, Fatemeh [3 ]
机构
[1] Grad Univ Adv Technol, Fac Sci & Modern Technol, Dept Appl Math, Kerman 7631818356, Iran
[2] Shahid Bahonar Univ Kerman, Dept Elect Engn, Kerman 76169133, Iran
[3] Univ Isfahan, Dept Comp Engn, Esfahan 8174673441, Iran
基金
美国国家科学基金会;
关键词
Temporal modeling; Action recognition; Deep learning; Transformer; NEURAL-NETWORKS; ATTENTION; LSTM; VISION; FUSION; CLASSIFICATION;
D O I
10.1007/s11042-023-17345-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In action recognition that relies on visual information, activities are recognized through spatio-temporal features from different modalities. The challenge of temporal modeling has been a long-standing issue in this field. There are a limited number of methods, such as pre-computed motion features, three-dimensional (3D) filters, and recurrent neural networks (RNNs), that are used in deep-based approaches to model motion information. However, the success of transformers in modeling long-range dependencies in natural language processing tasks has recently caught the attention of other domains, including speech, image, and video, as they can rely entirely on self-attention without using sequence-aligned RNNs or convolutions. Although the application of transformers to action recognition is relatively new, the amount of research proposed on this topic in the last few years is impressive. This paper aims to review recent progress in deep learning methods for modeling temporal variations in multimodal human action recognition. Specifically, it focuses on methods that use transformers for temporal modeling, highlighting their key features and the modalities they employ, while also identifying opportunities and challenges for future research.
引用
收藏
页码:59439 / 59489
页数:51
相关论文
共 50 条
  • [21] Temporal cues enhanced multimodal learning for action recognition in RGB-D videos
    Liu, Dan
    Meng, Fanrong
    Xia, Qing
    Ma, Zhiyuan
    Mi, Jinpeng
    Gan, Yan
    Ye, Mao
    Zhang, Jianwei
    NEUROCOMPUTING, 2024, 594
  • [22] Action recognition via spatio-temporal local features: A comprehensive study
    Zhen, Xiantong
    Shao, Ling
    IMAGE AND VISION COMPUTING, 2016, 50 : 1 - 13
  • [23] StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
    He, Dongliang
    Zhou, Zhichao
    Gan, Chuang
    Li, Fu
    Liu, Xiao
    Li, Yandong
    Wang, Limin
    Wen, Shilei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8401 - 8408
  • [24] Revisiting the Spatial and Temporal Modeling for Few-Shot Action Recognition
    Xing, Jiazheng
    Wang, Mengmeng
    Liu, Yong
    Mu, Boyu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3001 - 3009
  • [25] Multimodal Distillation for Egocentric Action Recognition
    Radevski, Gorjan
    Grujicic, Dusan
    Blaschko, Matthew
    Moens, Marie-Francine
    Tuytelaars, Tinne
    Proceedings of the IEEE International Conference on Computer Vision, 2023, : 5190 - 5201
  • [26] Multimodal Distillation for Egocentric Action Recognition
    Radevski, Gorjan
    Grujicic, Dusan
    Blaschko, Matthew
    Moens, Marie-Francine
    Tuytelaars, Tinne
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5190 - 5201
  • [27] Challenges and Limitations in Human Action Recognition on Unmanned Aerial Vehicles: A Comprehensive Survey
    Othman, Nashwan Adnan
    Aydin, Ilhan
    TRAITEMENT DU SIGNAL, 2021, 38 (05) : 1403 - 1411
  • [28] Progress of Human Action Recognition Research in the Last Ten Years: A Comprehensive Survey
    Singh, Pawan Kumar
    Kundu, Soumalya
    Adhikary, Titir
    Sarkar, Ram
    Bhattacharjee, Debotosh
    ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2022, 29 (04) : 2309 - 2349
  • [29] Progress of Human Action Recognition Research in the Last Ten Years: A Comprehensive Survey
    Pawan Kumar Singh
    Soumalya Kundu
    Titir Adhikary
    Ram Sarkar
    Debotosh Bhattacharjee
    Archives of Computational Methods in Engineering, 2022, 29 : 2309 - 2349
  • [30] A Survey on Multimodal Named Entity Recognition
    Qian, Shenyi
    Jin, Wenduo
    Chen, Yonggang
    Ma, Jiangtao
    Qiao, Yaqiong
    Lu, Jinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622