3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引:0
|
作者
Yang, Jun [1 ,2 ]
Sun, Shulong [2 ]
Chen, Jiayue [1 ]
Xie, Haizhen [1 ]
Wang, Yan [1 ]
Yang, Zenglong [1 ]
机构
[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China
[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期
基金
中国国家自然科学基金;
关键词
action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;
D O I
10.3390/app14167154
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Spatial-Temporal Neural Networks for Action Recognition
    Jing, Chao
    Wei, Ping
    Sun, Hongbin
    Zheng, Nanning
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 619 - 627
  • [42] Spatial-temporal pooling for action recognition in videos
    Wang, Jiaming
    Shao, Zhenfeng
    Huang, Xiao
    Lu, Tao
    Zhang, Ruiqian
    Lv, Xianwei
    NEUROCOMPUTING, 2021, 451 : 265 - 278
  • [43] Spatial-temporal interaction module for action recognition
    Luo, Hui-Lan
    Chen, Han
    Cheung, Yiu-Ming
    Yu, Yawei
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [44] Spatial-Temporal 3D Residual Correlation Network for Urban Traffic Status Prediction
    Bao, Yin-Xin
    Shi, Quan
    Shen, Qin-Qin
    Cao, Yang
    SYMMETRY-BASEL, 2022, 14 (01):
  • [45] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Yucai Bai
    Qin Zou
    Xieyuanli Chen
    Lingxi Li
    Zhengming Ding
    Long Chen
    International Journal of Computer Vision, 2023, 131 : 1550 - 1565
  • [46] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
    Bai, Yucai
    Zou, Qin
    Chen, Xieyuanli
    Li, Lingxi
    Ding, Zhengming
    Chen, Long
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (06) : 1550 - 1565
  • [47] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
    Li, Chenhao
    Zhang, Jing
    Yao, Jiacheng
    NEUROCOMPUTING, 2021, 453 : 383 - 392
  • [48] 3D Spatial-Temporal View based Motion Tracing in Human Action Recognition
    Silambarasi, R.
    Sahoo, Suraj Prakash
    Ari, Samit
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 1833 - 1837
  • [49] 3D HUMAN ACTION RECOGNITION BASED ON THE SPATIAL-TEMPORAL MOVING SKELETON DESCRIPTOR
    Yao, Hongxian
    Jiang, Xinghao
    Sun, Tanfeng
    Wang, Shilin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 937 - 942
  • [50] Dynamical Facial Expression Recognition by Integrating 3D Spatial-Temporal Network and Static Network
    Liu, Wenlong
    Han, Shoudong
    Chen, Yang
    PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS 2017), 2015, : 304 - 308