3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition

被引：0

作者：

Yang, Jun ^{[1
,2
]}

Sun, Shulong ^{[2
]}

Chen, Jiayue ^{[1
]}

Xie, Haizhen ^{[1
]}

Wang, Yan ^{[1
]}

Yang, Zenglong ^{[1
]}

机构：

[1] China Univ Min & Technol, Big Data & Internet Things Res Ctr, Beijing 100083, Peoples R China

[2] Minist Emergency Management, Key Lab Intelligent Min & Robot, Beijing 100083, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 16期

基金：

中国国家自然科学基金;

关键词：

action recognition; spatiotemporal attention; multi-staged residual; skeleton; 3D CNN;

D O I：

10.3390/app14167154

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Existing skeleton-based action recognition methods face the challenges of insufficient spatiotemporal feature mining and a low efficiency of information transmission. To solve these problems, this paper proposes a model called the Spatial-Temporal Attention Residual Network for 3D human action recognition (3D-STARNET). This model significantly improves the performance of action recognition through the following three main innovations: (1) the conversion from skeleton points to heat maps. Using Gaussian transform to convert skeleton point data into heat maps effectively reduces the model's strong dependence on the original skeleton point data and enhances the stability and robustness of the data; (2) a spatiotemporal attention mechanism (STA). A novel spatiotemporal attention mechanism is proposed, focusing on the extraction of key frames and key areas within frames, which significantly enhances the model's ability to identify behavioral patterns; (3) a multi-stage residual structure (MS-Residual). The introduction of a multi-stage residual structure improves the efficiency of data transmission in the network, solves the gradient vanishing problem in deep networks, and helps to improve the recognition efficiency of the model. Experimental results on the NTU-RGBD120 dataset show that 3D-STARNET has significantly improved the accuracy of action recognition, and the top1 accuracy of the overall network reached 96.74%. This method not only solves the robustness shortcomings of existing methods, but also improves the ability to capture spatiotemporal features, providing an efficient and widely applicable solution for action recognition based on skeletal data.

引用

页数：13

共 50 条

[41] Spatial-Temporal Neural Networks for Action Recognition
Jing, Chao
Wei, Ping
Sun, Hongbin
Zheng, Nanning
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 619 - 627
[42] Spatial-temporal pooling for action recognition in videos
Wang, Jiaming
Shao, Zhenfeng
Huang, Xiao
Lu, Tao
Zhang, Ruiqian
Lv, Xianwei
NEUROCOMPUTING, 2021, 451 : 265 - 278
[43] Spatial-temporal interaction module for action recognition
Luo, Hui-Lan
Chen, Han
Cheung, Yiu-Ming
Yu, Yawei
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
[44] Spatial-Temporal 3D Residual Correlation Network for Urban Traffic Status Prediction
Bao, Yin-Xin
Shi, Quan
Shen, Qin-Qin
Cao, Yang
SYMMETRY-BASEL, 2022, 14 (01):
[45] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
Yucai Bai
Qin Zou
Xieyuanli Chen
Lingxi Li
Zhengming Ding
Long Chen
International Journal of Computer Vision, 2023, 131 : 1550 - 1565
[46] Extreme Low-Resolution Action Recognition with Confident Spatial-Temporal Attention Transfer
Bai, Yucai
Zou, Qin
Chen, Xieyuanli
Li, Lingxi
Ding, Zhengming
Chen, Long
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (06) : 1550 - 1565
[47] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
Li, Chenhao
Zhang, Jing
Yao, Jiacheng
NEUROCOMPUTING, 2021, 453 : 383 - 392
[48] 3D Spatial-Temporal View based Motion Tracing in Human Action Recognition
Silambarasi, R.
Sahoo, Suraj Prakash
Ari, Samit
2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 1833 - 1837
[49] 3D HUMAN ACTION RECOGNITION BASED ON THE SPATIAL-TEMPORAL MOVING SKELETON DESCRIPTOR
Yao, Hongxian
Jiang, Xinghao
Sun, Tanfeng
Wang, Shilin
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 937 - 942
[50] Dynamical Facial Expression Recognition by Integrating 3D Spatial-Temporal Network and Static Network
Liu, Wenlong
Han, Shoudong
Chen, Yang
PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS 2017), 2015, : 304 - 308

← 1 2 3 4 5 →