Multimodal cooperative self-attention network for action recognition

被引:2
|
作者
Zhong, Zhuokun [1 ]
Hou, Zhenjie [1 ]
Liang, Jiuzhen [1 ]
Lin, En [2 ]
Shi, Haiyong [1 ]
机构
[1] Changzhou Univ, Sch Comp & Artificial Intelligence, Changzhou 213000, Peoples R China
[2] Goldcard Smart Grp Co Ltd, Hangzhou, Peoples R China
关键词
computer vision; image fusion;
D O I
10.1049/ipr2.12754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal human behaviour recognition is a research hotspot in computer vision. To fully use both skeleton and depth data, this paper constructs a new multimodal network identification scheme combined with the self-attention mechanism. The system comprises a transformer-based skeleton self-attention subnetwork and a depth self-attention subnetwork based on CNN. In the skeleton self-attention subnetwork, this paper proposes a motion synergy space feature that can integrate the information of each joint point according to the entirety and synergy of human motion and puts forward a quantitative standard for the contribution degree of each joint motion. In this paper, the results from the skeleton self-attention subnetwork and the depth self-attention subnetwork are integrated and they are verified on the NTU RGB+D and UTD-MHAD datasets. The authors have achieved 90% recognition rate on UTD-MHAD dataset, and the CS recognition rate of the authors' method on the NTU RGB+D dataset reaches 90.5% and the recognition rate of CV is 94.7%. Experimental results show that the network structure proposed in this paper achieves a high recognition rate, and its performance is better than most current methods.
引用
收藏
页码:1775 / 1783
页数:9
相关论文
共 50 条
  • [1] MGSAN: multimodal graph self-attention network for skeleton-based action recognition
    Wang, Junyi
    Li, Ziao
    Liu, Bangli
    Cai, Haibin
    Saada, Mohamad
    Meng, Qinggang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [2] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
    Sun, Licai
    Liu, Bin
    Tao, Jianhua
    Lian, Zheng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
  • [3] Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition
    Balaji, Pranav
    Prusty, Manas Ranjan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [4] An efficient self-attention network for skeleton-based action recognition
    Xiaofei Qin
    Rui Cai
    Jiabin Yu
    Changxiang He
    Xuedian Zhang
    Scientific Reports, 12 (1)
  • [5] Self-Attention Network for Skeleton-based Human Action Recognition
    Cho, Sangwoo
    Maqbool, Muhammad Hasan
    Liu, Fei
    Foroosh, Hassan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 624 - 633
  • [6] An efficient self-attention network for skeleton-based action recognition
    Qin, Xiaofei
    Cai, Rui
    Yu, Jiabin
    He, Changxiang
    Zhang, Xuedian
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [7] SPATIO-TEMPORAL SLOWFAST SELF-ATTENTION NETWORK FOR ACTION RECOGNITION
    Kim, Myeongjun
    Kim, Taehun
    Kim, Daijin
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2206 - 2210
  • [8] Skeleton action recognition via graph convolutional network with self-attention module
    Li, Min
    Chen, Ke
    Bai, Yunqing
    Pei, Jihong
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (04): : 2848 - 2864
  • [9] MGSAN: multimodal graph self-attention network for skeleton-based action recognitionMGSAN: multimodal graph self-attention network for skeleton-based action recognitionJ. Wang et al.
    Junyi Wang
    Ziao Li
    Bangli Liu
    Haibin Cai
    Mohamad Saada
    Qinggang Meng
    Multimedia Systems, 2024, 30 (6)
  • [10] The Multimodal Scene Recognition Method Based on Self-Attention and Distillation
    Sun, Ning
    Xu, Wei
    Liu, Jixin
    Chai, Lei
    Sun, Haian
    IEEE MULTIMEDIA, 2024, 31 (04) : 25 - 36