Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

被引:0
|
作者
Yaqing Hou
Hua Yu
Dongsheng Zhou
Pengfei Wang
Hongwei Ge
Jianxin Zhang
Qiang Zhang
机构
[1] Dalian University of Technology,School of Computer Science and Technology
[2] Dalian University,School of Software Engineering
[3] Dalian Minzu University,School of Computer Science and Engineering
来源
关键词
Spatio-temporal attention networks; Spatial transformer network; Feature fusion; Human action recognition;
D O I
暂无
中图分类号
学科分类号
摘要
In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.
引用
收藏
页码:16439 / 16450
页数:11
相关论文
共 50 条
  • [1] Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition
    Hou, Yaqing
    Yu, Hua
    Zhou, Dongsheng
    Wang, Pengfei
    Ge, Hongwei
    Zhang, Jianxin
    Zhang, Qiang
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16439 - 16450
  • [2] Human Action Recognition via Spatio-temporal Dual Network Flow and Visual Attention Fusion
    Liu Tianliang
    Qiao Qingwei
    Wan Junwei
    Dai Xiubin
    Luo Jiebo
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (10) : 2395 - 2401
  • [3] Action recognition method of spatio-temporal feature fusion deep learning network
    Pei, Xiaomin
    Fan, Huijie
    Tang, Yandong
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (02):
  • [4] Spatio-temporal Multi-level Fusion for Human Action Recognition
    Manh-Hung Lu
    Thi-Oanh Nguyen
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
  • [5] Attention Guided Food Recognition via Multi-Stage Local Feature Fusion
    Deng, Gonghui
    Wu, Dunzhi
    Chen, Weizhen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (02): : 1985 - 2003
  • [6] HASTF: a hybrid attention spatio-temporal feature fusion network for EEG emotion recognition
    Hu, Fangzhou
    Wang, Fei
    Bi, Jinying
    An, Zida
    Chen, Chao
    Qu, Gangguo
    Han, Shuai
    FRONTIERS IN NEUROSCIENCE, 2024, 18
  • [7] MLSTIF: multi-level spatio-temporal and human-object interaction feature fusion network for spatio-temporal action detection
    Rui Yang
    Hui Zhang
    Mulan Qiu
    Min Wang
    Multimedia Systems, 2025, 31 (3)
  • [8] Local Feature Fusion Temporal Convolutional Network for Human Action Recognition
    Song Z.
    Zhou Y.
    Jia J.
    Xin S.
    Liu Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (03): : 418 - 424
  • [9] STCA: an action recognition network with spatio-temporal convolution and attention
    Tian, Qiuhong
    Miao, Weilun
    Zhang, Lizao
    Yang, Ziyu
    Yu, Yang
    Zhao, Yanying
    Yao, Lan
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
  • [10] Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature
    Indhumathi, C.
    Murugan, V
    Muthulakshmii, G.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2022, 22 (05)