Multimodal cooperative self-attention network for action recognition

被引:2
|
作者
Zhong, Zhuokun [1 ]
Hou, Zhenjie [1 ]
Liang, Jiuzhen [1 ]
Lin, En [2 ]
Shi, Haiyong [1 ]
机构
[1] Changzhou Univ, Sch Comp & Artificial Intelligence, Changzhou 213000, Peoples R China
[2] Goldcard Smart Grp Co Ltd, Hangzhou, Peoples R China
关键词
computer vision; image fusion;
D O I
10.1049/ipr2.12754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal human behaviour recognition is a research hotspot in computer vision. To fully use both skeleton and depth data, this paper constructs a new multimodal network identification scheme combined with the self-attention mechanism. The system comprises a transformer-based skeleton self-attention subnetwork and a depth self-attention subnetwork based on CNN. In the skeleton self-attention subnetwork, this paper proposes a motion synergy space feature that can integrate the information of each joint point according to the entirety and synergy of human motion and puts forward a quantitative standard for the contribution degree of each joint motion. In this paper, the results from the skeleton self-attention subnetwork and the depth self-attention subnetwork are integrated and they are verified on the NTU RGB+D and UTD-MHAD datasets. The authors have achieved 90% recognition rate on UTD-MHAD dataset, and the CS recognition rate of the authors' method on the NTU RGB+D dataset reaches 90.5% and the recognition rate of CV is 94.7%. Experimental results show that the network structure proposed in this paper achieves a high recognition rate, and its performance is better than most current methods.
引用
收藏
页码:1775 / 1783
页数:9
相关论文
共 50 条
  • [21] Self-Attention based Siamese Neural Network recognition Model
    Liu, Yuxing
    Chang, Geng
    Fu, Guofeng
    Wei, Yingchao
    Lan, Jie
    Liu, Jiarui
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 721 - 724
  • [22] A Deep Dilated Convolutional Self-attention Model for Multimodal Human Activity Recognition
    Wang, Shengzhi
    Xiao, Shuo
    Wang, Yu
    Jiang, Haifeng
    Zhang, Guopeng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 791 - 797
  • [23] CSAT-FTCN: A Fuzzy-Oriented Model with Contextual Self-attention Network for Multimodal Emotion Recognition
    Jiang, Dazhi
    Liu, Hao
    Wei, Runguo
    Tu, Geng
    COGNITIVE COMPUTATION, 2023, 15 (03) : 1082 - 1091
  • [24] CSAT-FTCN: A Fuzzy-Oriented Model with Contextual Self-attention Network for Multimodal Emotion Recognition
    Dazhi Jiang
    Hao Liu
    Runguo Wei
    Geng Tu
    Cognitive Computation, 2023, 15 : 1082 - 1091
  • [25] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605
  • [26] Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
    Xiang, Wangmeng
    Li, Chao
    Wang, Biao
    Wei, Xihan
    Hua, Xian-Sheng
    Zhang, Lei
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 627 - 644
  • [27] Global Positional Self-Attention for Skeleton-Based Action Recognition
    Kim, Jaehwan
    Lee, Junsuk
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3355 - 3361
  • [28] Three-Stream Network With Bidirectional Self-Attention for Action Recognition in Extreme Low Resolution Videos
    Purwanto, Didik
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (08) : 1187 - 1191
  • [29] Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition
    Wu, Zhize
    Ding, Yue
    Wan, Long
    Li, Teng
    Nian, Fudong
    PATTERN RECOGNITION, 2025, 159
  • [30] Spatial–temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
    Haiwen Cao
    Chunlei Wu
    Jing Lu
    Jie Wu
    Leiquan Wang
    Signal, Image and Video Processing, 2023, 17 : 1173 - 1180