Spatiotemporal attention enhanced features fusion network for action recognition

被引:0
|
作者
Danfeng Zhuang
Min Jiang
Jun Kong
Tianshan Liu
机构
[1] Jiangnan University,Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence
[2] The Hong Kong Polytechnic University,Department of Electronic and Information Engineering
关键词
Action recognition; Three-stream; Spatiotemporal attention; Features fusion;
D O I
暂无
中图分类号
学科分类号
摘要
In recent years, action recognition has become a popular and challenging task in computer vision. Nowadays, two-stream networks with appearance stream and motion stream can make judgment jointly and get excellent action classification results. But many of these networks fused the features or scores simply, and the characteristics in different streams were not utilized effectively. Meanwhile, the spatial context and temporal information were not fully utilized and processed in some networks. In this paper, a novel three-stream network spatiotemporal attention enhanced features fusion network for action recognition is proposed. Firstly, features fusion stream which includes multi-level features fusion blocks, is designed to train the two streams jointly and complement the two-stream network. Secondly, we model the channel features obtained by spatial context to enhance the ability to extract useful spatial semantic features at different levels. Thirdly, a temporal attention module which can model the temporal information makes the extracted temporal features more representative. A large number of experiments are performed on UCF101 dataset and HMDB51 dataset, which verify the effectiveness of our proposed network for action recognition.
引用
收藏
页码:823 / 841
页数:18
相关论文
共 50 条
  • [1] Spatiotemporal attention enhanced features fusion network for action recognition
    Zhuang, Danfeng
    Jiang, Min
    Kong, Jun
    Liu, Tianshan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (03) : 823 - 841
  • [2] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Yang, Ge
    Zou, Wu-xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 9875 - 9896
  • [3] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Ge Yang
    Wu-xing Zou
    [J]. Multimedia Tools and Applications, 2022, 81 : 9875 - 9896
  • [4] Spatiotemporal information deep fusion network with frame attention mechanism for video action recognition
    Ou, Hongshi
    Sun, Jifeng
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [5] Multiple depth-levels features fusion enhanced network for action recognition
    Wang, Shengquan
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 73
  • [6] Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition
    Shi, Zhensheng
    Cao, Liangjie
    Guan, Cheng
    Zheng, Haiyong
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Bing
    [J]. IEEE ACCESS, 2020, 8 : 16785 - 16794
  • [7] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [8] A Spatiotemporal Fusion Network For Skeleton-Based Action Recognition
    Bao, Wenxia
    Wang, Junyi
    Yang, Xianjun
    Chen, Hemu
    [J]. 2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 347 - 352
  • [9] Feature Fusion of Deep Spatial Features and Handcrafted Spatiotemporal Features for Human Action Recognition
    Uddin, Md Azher
    Lee, Young-Koo
    [J]. SENSORS, 2019, 19 (07)
  • [10] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    [J]. 2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605