Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

被引:0
|
作者
Yaqing Hou
Hua Yu
Dongsheng Zhou
Pengfei Wang
Hongwei Ge
Jianxin Zhang
Qiang Zhang
机构
[1] Dalian University of Technology,School of Computer Science and Technology
[2] Dalian University,School of Software Engineering
[3] Dalian Minzu University,School of Computer Science and Engineering
来源
关键词
Spatio-temporal attention networks; Spatial transformer network; Feature fusion; Human action recognition;
D O I
暂无
中图分类号
学科分类号
摘要
In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.
引用
收藏
页码:16439 / 16450
页数:11
相关论文
共 50 条
  • [21] A multi-stage feature fusion defogging network based on the attention mechanism
    Song, Yuqin
    Zhao, Jitao
    Shang, Chunliang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4577 - 4599
  • [22] A multi-stage feature fusion defogging network based on the attention mechanism
    Yuqin Song
    Jitao Zhao
    Chunliang Shang
    The Journal of Supercomputing, 2024, 80 (4) : 4577 - 4599
  • [23] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
    Borzeshi, Ehsan Zare
    Concha, Oscar Perez
    Piccardi, Massimo
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482
  • [24] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [25] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [26] Resstanet: deep residual spatio-temporal attention network for violent action recognition
    Pandey A.
    Kumar P.
    International Journal of Information Technology, 2024, 16 (5) : 2891 - 2900
  • [27] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [28] Multi-Attention Spatio-Temporal Feature Extraction Network for Image Deraining
    Feng, Shangyu
    Yin, Dejun
    Sun, Shijun
    2023 8th International Conference on Intelligent Computing and Signal Processing, ICSP 2023, 2023, : 1804 - 1809
  • [29] Human action recognition using Local Spatio-Temporal Discriminant Embedding
    Jia, Kui
    Yeung, Dit-Yan
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 3040 - +
  • [30] Local Spatio-Temporal Interest Point Detection for Human Action Recognition
    Li, Feng
    Du, Jixiang
    2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 579 - 582