Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

被引：0

作者：

Yaqing Hou

Hua Yu

Dongsheng Zhou

Pengfei Wang

Hongwei Ge

Jianxin Zhang

Qiang Zhang

机构：

[1] Dalian University of Technology,School of Computer Science and Technology

[2] Dalian University,School of Software Engineering

[3] Dalian Minzu University,School of Computer Science and Engineering

来源：

Neural Computing and Applications | 2021年 / 33卷

关键词：

Spatio-temporal attention networks; Spatial transformer network; Feature fusion; Human action recognition;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.

引用

页码：16439 / 16450

页数：11

共 50 条

[1] Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition
Hou, Yaqing
Yu, Hua
Zhou, Dongsheng
Wang, Pengfei
Ge, Hongwei
Zhang, Jianxin
Zhang, Qiang
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16439 - 16450
[2] Human Action Recognition via Spatio-temporal Dual Network Flow and Visual Attention Fusion
Liu Tianliang
Qiao Qingwei
Wan Junwei
Dai Xiubin
Luo Jiebo
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (10) : 2395 - 2401
[3] Action recognition method of spatio-temporal feature fusion deep learning network
Pei, Xiaomin
Fan, Huijie
Tang, Yandong
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (02):
[4] Spatio-temporal Multi-level Fusion for Human Action Recognition
Manh-Hung Lu
Thi-Oanh Nguyen
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
[5] Attention Guided Food Recognition via Multi-Stage Local Feature Fusion
Deng, Gonghui
Wu, Dunzhi
Chen, Weizhen
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1985 - 2003
[6] HASTF: a hybrid attention spatio-temporal feature fusion network for EEG emotion recognition
Hu, Fangzhou
Wang, Fei
Bi, Jinying
An, Zida
Chen, Chao
Qu, Gangguo
Han, Shuai
FRONTIERS IN NEUROSCIENCE, 2024, 18
[7] MLSTIF: multi-level spatio-temporal and human-object interaction feature fusion network for spatio-temporal action detection
Rui Yang
Hui Zhang
Mulan Qiu
Min Wang
Multimedia Systems, 2025, 31 (3)
[8] Local Feature Fusion Temporal Convolutional Network for Human Action Recognition
Song Z.
Zhou Y.
Jia J.
Xin S.
Liu Y.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (03): : 418 - 424
[9] STCA: an action recognition network with spatio-temporal convolution and attention
Tian, Qiuhong
Miao, Weilun
Zhang, Lizao
Yang, Ziyu
Yu, Yang
Zhao, Yanying
Yao, Lan
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
[10] Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature
Indhumathi, C.
Murugan, V
Muthulakshmii, G.
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2022, 22 (05)

← 1 2 3 4 5 →