3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引:0
|
作者
Yang, Jun [1 ,2 ]
Sun, Shulong [2 ]
Chen, Jiayue [1 ]
Xie, Haizhen [1 ]
Wang, Yan [1 ]
Yang, Zenglong [1 ]
机构
[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China
[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;
D O I
10.3390/app14167154
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition
    Fu, Jie
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5213 - 5224
  • [32] STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition
    Yang, Hao
    Yuan, Chunfeng
    Zhang, Li
    Sun, Yunda
    Hu, Weiming
    Maybank, Stephen J.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5783 - 5793
  • [33] Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly
    Huang, Jianfeng
    Liu, Xiang
    Hu, Huan
    Tang, Shanghua
    Li, Chenyang
    Zhao, Shaoan
    Lin, Yimin
    Wang, Kai
    Liu, Zhaoxiang
    Lian, Shiguo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 114 - 130
  • [34] A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition
    Wang, Huafeng
    Xia, Tao
    Li, Hanlin
    Gu, Xianfeng
    Lv, Weifeng
    Wang, Yuehai
    MATHEMATICS, 2021, 9 (24)
  • [35] Spatial-Temporal Exclusive Capsule Network for Open Set Action Recognition
    Feng, Yangbo
    Gao, Junyu
    Yang, Shicai
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9464 - 9478
  • [36] Spatial-temporal pyramid based Convolutional Neural Network for action recognition
    Zheng, Zhenxing
    An, Gaoyun
    Wu, Dapeng
    Ruan, Qiuqi
    NEUROCOMPUTING, 2019, 358 : 446 - 455
  • [37] AR3D: Attention Residual 3D Network for Human Action Recognition
    Dong, Min
    Fang, Zhenglin
    Li, Yongfa
    Bi, Sheng
    Chen, Jiangcheng
    SENSORS, 2021, 21 (05) : 1 - 15
  • [38] Multi-Branch Spatial-Temporal Attention Graph Convolution Network for Skeleton-based Action Recognition
    Wang, Daoshuai
    Li, Dewei
    Guan, Yaonan
    Wang, Gang
    Shao, Haibin
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 6487 - 6492
  • [39] Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
    Cao, Haiwen
    Wu, Chunlei
    Lu, Jing
    Wu, Jie
    Wang, Leiquan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1173 - 1180
  • [40] Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network
    Guo, Qi
    Zhang, Shujun
    Li, Hui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1653 - 1670