Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

被引:0
|
作者
Yaqing Hou
Hua Yu
Dongsheng Zhou
Pengfei Wang
Hongwei Ge
Jianxin Zhang
Qiang Zhang
机构
[1] Dalian University of Technology,School of Computer Science and Technology
[2] Dalian University,School of Software Engineering
[3] Dalian Minzu University,School of Computer Science and Engineering
来源
关键词
Spatio-temporal attention networks; Spatial transformer network; Feature fusion; Human action recognition;
D O I
暂无
中图分类号
学科分类号
摘要
In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.
引用
收藏
页码:16439 / 16450
页数:11
相关论文
共 50 条
  • [41] Spatio-temporal information for human action recognition
    Li Yao
    Yunjian Liu
    Shihui Huang
    EURASIP Journal on Image and Video Processing, 2016
  • [42] PASTFNet: a paralleled attention spatio-temporal fusion network for micro-expression recognition
    Tian, Haichen
    Gong, Weijun
    Li, Wei
    Qian, Yurong
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (06) : 1911 - 1924
  • [43] Human action categorization using discriminative local spatio-temporal feature weighting
    Ghodrati, Amir
    Kasaei, Shohreh
    INTELLIGENT DATA ANALYSIS, 2012, 16 (04) : 537 - 550
  • [44] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Lu, Xuemin
    Quan, Wei
    Marek, Reformat
    Zhao, Haiquan
    Chen, Jim X. X.
    VISUAL COMPUTER, 2024, 40 (05): : 3163 - 3181
  • [45] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Xuemin Lu
    Wei Quan
    Reformat Marek
    Haiquan Zhao
    Jim X. Chen
    The Visual Computer, 2024, 40 : 3163 - 3181
  • [46] PASTFNet: a paralleled attention spatio-temporal fusion network for micro-expression recognition
    Haichen Tian
    Weijun Gong
    Wei Li
    Yurong Qian
    Medical & Biological Engineering & Computing, 2024, 62 : 1911 - 1924
  • [47] Spatio-Temporal Fusion for Human Action Recognition via Joint Trajectory Graph
    Zheng, Yaolin
    Huang, Hongbo
    Wang, Xiuying
    Yan, Xiaoxu
    Xu, Longfei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7579 - 7587
  • [48] Spatio-Temporal Attention Fusion SlowFast for Interrogation Violation Recognition
    Wang, Hailun
    Dong, Bin
    Zhu, Qirui
    Chen, Zhiqiang
    Chen, Yi
    IEEE ACCESS, 2023, 11 : 103801 - 103813
  • [49] A spatio-temporal attention fusion model for students behaviour recognition
    Wang, Xiaoli
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (34)
  • [50] Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
    Cheng, Shilei
    Xie, Mei
    Ma, Zheng
    Li, Siqi
    Gu, Song
    Yang, Feng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01) : 220 - 224