Multi-modal fusion method for human action recognition based on IALC

被引:2
|
作者
Zhang, Yinhuan [1 ,2 ]
Xiao, Qinkun [1 ,3 ]
Liu, Xing [3 ]
Wei, Yongquan [4 ]
Chu, Chaoqin [1 ]
Xue, Jingyun [1 ]
机构
[1] Xian Technol Univ, Sch Mechatron Engn, Xian, Peoples R China
[2] Weinan Vocat & Tech Coll, Sch Construct Engn, Weinan, Peoples R China
[3] Xian Technol Univ, Sch Elect Informat Engn, Xian 710021, Peoples R China
[4] CRRC Tangshan Co Ltd, Tangshan, Peoples R China
关键词
Fusion methods - Hidden-Markov models - Human behaviors - Human-action recognition - Multi-modal - Multi-modal fusion - Performance - Recognition accuracy - Sequence features - Video sequences;
D O I
10.1049/ipr2.12640
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In occlusion and interaction scenarios, human action recognition (HAR) accuracy is low. To address this issue, this paper proposes a novel multi-modal fusion framework for HAR. In this framework, a module called improved attention long short-term memory (IAL) is proposed, which combines the improved SE-ResNet50 (ISE-ResNet50) with long short-term memory (LSTM). IAL can extract the video sequence features and the skeleton sequence features of human behaviour. To improve the performance of HAR at a high semantic level, the obtained multi-modal sequence features are fed into a couple hidden Markov model (CHMM), and a multi-modal IAL+CHMM method called IALC is developed based on a probability graph model. To test the performance of the proposed method, experiments are conducted on the HMDB51, UCF101, Kinetics 400k, and ActivityNet datasets, and the obtained recognition accuracy are 86.40%, 97.78%, 81.12%, and 69.36% on the four datasets, respectively. The experimental results show that when the environment is complex, the proposed multi-modal fusion method for HAR based on the IALC can achieve more accurate target recognition results.
引用
收藏
页码:388 / 400
页数:13
相关论文
共 50 条
  • [1] Hybrid Multi-modal Fusion for Human Action Recognition
    Seddik, Bassem
    Gazzah, Sami
    Ben Amara, Najoua Essoukri
    [J]. IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 201 - 209
  • [2] Rethinking Fusion Baselines for Multi-modal Human Action Recognition
    Jiang, Hongda
    Li, Yanghao
    Song, Sijie
    Liu, Jiaying
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 178 - 187
  • [3] Human activity recognition based on multi-modal fusion
    Zhang, Cheng
    Zu, Tianqi
    Hou, Yibin
    He, Jian
    Yang, Shengqi
    Dong, Ruihai
    [J]. CCF TRANSACTIONS ON PERVASIVE COMPUTING AND INTERACTION, 2023, 5 (03) : 321 - 332
  • [4] Human activity recognition based on multi-modal fusion
    Cheng Zhang
    Tianqi Zu
    Yibin Hou
    Jian He
    Shengqi Yang
    Ruihai Dong
    [J]. CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 321 - 332
  • [5] A Novel Chinese Character Recognition Method Based on Multi-Modal Fusion
    Liu, Jin
    Lyu, Shiqi
    Yu, Chao
    Yang, Yihe
    Luan, Cuiju
    [J]. FUZZY SYSTEMS AND DATA MINING V (FSDM 2019), 2019, 320 : 487 - 492
  • [6] Multi-View and Multi-Modal Action Recognition with Learned Fusion
    Ardianto, Sandy
    Hang, Hsueh-Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1601 - 1604
  • [7] MULTI-MODAL FUSION WITH OBSERVATION POINTS FOR SKELETON ACTION RECOGNITION
    Singh, Iqbal
    Zhu, Xiaodan
    Greenspan, Michael
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1781 - 1785
  • [8] Multi-modal Transformer for Indoor Human Action Recognition
    Do, Jeonghyeok
    Kim, Munchurl
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1155 - 1160
  • [9] Human-Object Contour for Action Recognition with Attentional Multi-modal Fusion Network
    Yu, Miao
    Zhang, Weizhe
    Zeng, Qingxiang
    Wang, Chao
    Li, Jie
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (ICAIIC 2019), 2019, : 241 - 246
  • [10] Language-guided Multi-Modal Fusion for Video Action Recognition
    Hsiao, Jenhao
    Li, Yikang
    Ho, Chiuman
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3151 - 3155