Multi-modal fusion method for human action recognition based on IALC

被引:2
|
作者
Zhang, Yinhuan [1 ,2 ]
Xiao, Qinkun [1 ,3 ]
Liu, Xing [3 ]
Wei, Yongquan [4 ]
Chu, Chaoqin [1 ]
Xue, Jingyun [1 ]
机构
[1] Xian Technol Univ, Sch Mechatron Engn, Xian, Peoples R China
[2] Weinan Vocat & Tech Coll, Sch Construct Engn, Weinan, Peoples R China
[3] Xian Technol Univ, Sch Elect Informat Engn, Xian 710021, Peoples R China
[4] CRRC Tangshan Co Ltd, Tangshan, Peoples R China
关键词
Fusion methods - Hidden-Markov models - Human behaviors - Human-action recognition - Multi-modal - Multi-modal fusion - Performance - Recognition accuracy - Sequence features - Video sequences;
D O I
10.1049/ipr2.12640
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi-modal fusion framework for HAR. In this framework, a module called improved attention long short-term memory (IAL) is proposed, which combines the improved SE-ResNet50 (ISE-ResNet50) with long short-term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi-modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi-modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi-modal fusion method for HAR based on the IALC can achieve more accurate target recognition results.
引用
收藏
页码:388 / 400
页数:13
相关论文
共 50 条
  • [21] Vision-Based Multi-Modal Framework for Action Recognition
    Romaissa, Beddiar Djamila
    Mourad, Oussalah
    Brahim, Nini
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5859 - 5866
  • [22] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [23] DFN: A deep fusion network for flexible single and multi-modal action recognition
    Li, Chuankun
    Hou, Yonghong
    Li, Wanqing
    Ding, Zewei
    Wang, Pichao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [24] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [25] Modality Mixer for Multi-modal Action Recognition
    Lee, Sumin
    Woo, Sangmin
    Park, Yeonju
    Nugroho, Muhammad Adi
    Kim, Changick
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3297 - 3306
  • [26] Multi-modal Perception Fusion Method Based on Cross Attention
    Zhang B.-L.
    Pan Z.-H.
    Jiang J.-Z.
    Zhang C.-B.
    Wang Y.-X.
    Yang C.-L.
    Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2024, 37 (03): : 181 - 193
  • [27] Visual Sorting Method Based on Multi-Modal Information Fusion
    Han, Song
    Liu, Xiaoping
    Wang, Gang
    APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [28] Evaluation Method of Teaching Styles Based on Multi-modal Fusion
    Tang, Wen
    Wang, Chongwen
    Zhang, Yi
    2021 THE 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING, ICCIP 2021, 2021, : 9 - 15
  • [29] Multi-modal Video Action Recognition Method Based on Language-visual Contrastive Learning
    Zhang Y.
    Zhang B.-B.
    Dong W.
    An F.-M.
    Zhang J.-X.
    Zhang Q.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (02): : 417 - 430
  • [30] Visual-guided hierarchical iterative fusion for multi-modal video action recognition
    Zhang, Bingbing
    Zhang, Ying
    Zhang, Jianxin
    Sun, Qiule
    Wang, Rong
    Zhang, Qiang
    Pattern Recognition Letters, 2024, 186 : 213 - 220