Select and Focus: Action Recognition with Spatial-Temporal Attention

被引：0

作者：

Chan, Wensong ^{[1
]}

Tian, Zhiqiang ^{[1
]}

Liu, Shuai ^{[1
]}

Ren, Jing ^{[2
]}

Lan, Xuguang ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Peoples R China

[2] Xian Aeronaut Univ, Xian, Peoples R China

[3] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

来源：

INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2019, PT III | 2019年 / 11742卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Human action recognition; Deep learning; Attention;

D O I：

10.1007/978-3-030-27535-8_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of neural networks, human action recognition has been achieved great improvement by using convolutional neural networks (CNN) or recurrent neural networks (RNN). In this paper, we propose a model based on weighted spatial-temporal attention for action recognition. This model selects the key parts in each video frame and important frames in each video sequence. Then the model focuses on analyzing these key parts and frames. Therefore, the most important tasks of our model is to find out the key parts spatially and the important frames temporally for recognizing the action. Our model is trained and tested on three datasets including UCF-11, UCF-101, and HMDB51. The experiments demonstrate that our model can achieve a satisfactory result for human action recognition.

引用

页码：461 / 471

页数：11

共 50 条

[31] Action recognition with spatial-temporal discriminative filter banks
Martinez, Brais
Modolo, Davide
Xiong, Yuanjun
Tighe, Joseph
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5481 - 5490
[32] Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Luo, Chenxu
Yuille, Alan
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5511 - 5520
[33] Spatial-Temporal Interleaved Network for Efficient Action Recognition
Jiang, Shengqin
Zhang, Haokui
Qi, Yuankai
Liu, Qingshan
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
[34] 3D-STARNET: Spatial-Temporal Attention Residual Network for Robust Action Recognition
Yang, Jun
Sun, Shulong
Chen, Jiayue
Xie, Haizhen
Wang, Yan
Yang, Zenglong
APPLIED SCIENCES-BASEL, 2024, 14 (16):
[35] Multiple Distilling-based spatial-temporal attention networks for unsupervised human action recognition
Zhang, Cheng
Zhong, Jianqi
Cao, Wenming
Ji, Jianhua
INTELLIGENT DATA ANALYSIS, 2024, 28 (04) : 921 - 941
[36] Joint image-instance spatial-temporal attention for few-shot action recognition
Qian, Zefeng
Zhang, Chongyang
Huang, Yifei
Wang, Gang
Ying, Jiangyong
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254
[37] Skeleton-based attention-aware spatial-temporal model for action detection and recognition
Cui, Ran
Zhu, Aichun
Wu, Jingran
Hua, Gang
IET COMPUTER VISION, 2020, 14 (05) : 177 - 184
[38] Robust Human Action Recognition Using Global Spatial-Temporal Attention for Human Skeleton Data
Han, Yun
Chung, Sheng-Luen
Ambikapathi, ArulMurugan
Chan, Jui-Shan
Lin, Wei-You
Su, Shun-Feng
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[39] Spatial-Temporal Bottom-Up Top-Down Attention Model for Action Recognition
Wang, Jinpeng
Ma, Andy J.
IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 81 - 92
[40] Convolution spatial-temporal attention network for EEG emotion recognition
Cao, Lei
Yu, Binlong
Dong, Yilin
Liu, Tianyu
Li, Jie
PHYSIOLOGICAL MEASUREMENT, 2024, 45 (12)

← 1 2 3 4 5 →