3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引：0

作者：

Yang, Jun ^{[1
,2
]}

Sun, Shulong ^{[2
]}

Chen, Jiayue ^{[1
]}

Xie, Haizhen ^{[1
]}

Wang, Yan ^{[1
]}

Yang, Zenglong ^{[1
]}

机构：

[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China

[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期

基金：

中国国家自然科学基金;

关键词：

action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;

D O I：

10.3390/app14167154

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.

引用

页数：13

共 50 条

[31] Learning Semantic-Aware Spatial-Temporal Attention for Interpretable Action Recognition
Fu, Jie
Gao, Junyu
Xu, Changsheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5213 - 5224
[32] STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition
Yang, Hao
Yuan, Chunfeng
Zhang, Li
Sun, Yunda
Hu, Weiming
Maybank, Stephen J.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5783 - 5793
[33] Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly
Huang, Jianfeng
Liu, Xiang
Hu, Huan
Tang, Shanghua
Li, Chenyang
Zhao, Shaoan
Lin, Yimin
Wang, Kai
Liu, Zhaoxiang
Lian, Shiguo
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 114 - 130
[34] A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition
Wang, Huafeng
Xia, Tao
Li, Hanlin
Gu, Xianfeng
Lv, Weifeng
Wang, Yuehai
MATHEMATICS, 2021, 9 (24)
[35] Spatial-Temporal Exclusive Capsule Network for Open Set Action Recognition
Feng, Yangbo
Gao, Junyu
Yang, Shicai
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9464 - 9478
[36] Spatial-temporal pyramid based Convolutional Neural Network for action recognition
Zheng, Zhenxing
An, Gaoyun
Wu, Dapeng
Ruan, Qiuqi
NEUROCOMPUTING, 2019, 358 : 446 - 455
[37] AR3D: Attention Residual 3D Network for Human Action Recognition
Dong, Min
Fang, Zhenglin
Li, Yongfa
Bi, Sheng
Chen, Jiangcheng
SENSORS, 2021, 21 (05) : 1 - 15
[38] Multi-Branch Spatial-Temporal Attention Graph Convolution Network for Skeleton-based Action Recognition
Wang, Daoshuai
Li, Dewei
Guan, Yaonan
Wang, Gang
Shao, Haibin
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 6487 - 6492
[39] Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
Cao, Haiwen
Wu, Chunlei
Lu, Jing
Wu, Jie
Wang, Leiquan
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1173 - 1180
[40] Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network
Guo, Qi
Zhang, Shujun
Li, Hui
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1653 - 1670

← 1 2 3 4 5 →