Residual attention fusion network for video action recognition

被引:0
|
作者
Li, Ao [1 ]
Yi, Yang [1 ,2 ,3 ]
Liang, Daan [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Xinhua Coll, Guangzhou 510520, Peoples R China
[3] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510275, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Temporal modeling; Channel-wise attention; Pixel-wise attention; HISTOGRAMS; LSTM;
D O I
10.1016/j.jvcir.2023.103987
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition in videos is a fundamental and important topic in computer vision, and modeling spatial-temporal dynamics in a video is crucial for action classification. In this paper, a novel attention module named Channel-wise Non-local Attention Module (CNAM) is proposed to highlight the important features both spatially and temporally. Besides, another new attention module named Channel-wise Attention Recalibration Module (CARM) is developed to focus on capturing discriminative features at channel level. Based on these two attention modules, a novel convolutional neural network named Residual Attention Fusion Network (RAFN) is proposed to model long-range temporal structure and capture more discriminative action features at the same time. More specifically, first, a sparse temporal sampling strategy is adopted to uniformly sample video data as input to RAFN along the temporal dimension. Secondly, the attention modules CNAM and CARM are plugged into residual network for highlighting important action regions around actors. Finally, the classification scores of four streams of RAFN are combined by late fusion. The experimental results on HMDB51 and UCF101 demonstrate the effectiveness and excellent recognition performance of our proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Residual Attention-based Fusion for Video Classification
    Pouyanfar, Samira
    Wang, Tianyi
    Chen, Shu-Ching
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 478 - 480
  • [32] Spatiotemporal Residual Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [33] Residual network based on convolution attention model and feature fusion for dance motion recognition
    Shen, Dianhuai
    Jiang, Xueying
    Teng, Lin
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (04):
  • [34] Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition
    Cai, Yutong
    Lin, Weiyao
    See, John
    Cheng, Ming-Ming
    Liu, Guangcan
    Xiong, Hongkai
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [35] R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition
    Liu, Quanle
    Che, Xiangjiu
    Bie, Mei
    IEEE ACCESS, 2019, 7 : 82246 - 82255
  • [36] Resstanet: deep residual spatio-temporal attention network for violent action recognition
    Pandey A.
    Kumar P.
    International Journal of Information Technology, 2024, 16 (5) : 2891 - 2900
  • [37] Spatiotemporal Fusion Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    Zhang, Junxuan
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
  • [38] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [39] Deep Fusion Module for Video Action Recognition
    Li, Yunyao
    Zheng, Zihao
    Zhou, Mingliang
    Yang, Guangchao
    Wei, Xuekai
    Pu, Huayan
    Luo, Jun
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (14)
  • [40] Context-Aware Memory Attention Network for Video-Based Action Recognition
    Koh, Thean Chun
    Yeo, Chai Kiat
    Vaitesswar, U. S.
    Jing, Xuan
    2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,