Residual attention fusion network for video action recognition

被引:0
|
作者
Li, Ao [1 ]
Yi, Yang [1 ,2 ,3 ]
Liang, Daan [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Xinhua Coll, Guangzhou 510520, Peoples R China
[3] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510275, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Temporal modeling; Channel-wise attention; Pixel-wise attention; HISTOGRAMS; LSTM;
D O I
10.1016/j.jvcir.2023.103987
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human action recognition in videos is a fundamental and important topic in computer vision, and modeling spatial-temporal dynamics in a video is crucial for action classification. In this paper, a novel attention module named Channel-wise Non-local Attention Module (CNAM) is proposed to highlight the important features both spatially and temporally. Besides, another new attention module named Channel-wise Attention Recalibration Module (CARM) is developed to focus on capturing discriminative features at channel level. Based on these two attention modules, a novel convolutional neural network named Residual Attention Fusion Network (RAFN) is proposed to model long-range temporal structure and capture more discriminative action features at the same time. More specifically, first, a sparse temporal sampling strategy is adopted to uniformly sample video data as input to RAFN along the temporal dimension. Secondly, the attention modules CNAM and CARM are plugged into residual network for highlighting important action regions around actors. Finally, the classification scores of four streams of RAFN are combined by late fusion. The experimental results on HMDB51 and UCF101 demonstrate the effectiveness and excellent recognition performance of our proposed method.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Video action recognition method based on attention residual network and LSTM
    Zhang, Yu
    Dong, Pengyue
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3611 - 3616
  • [2] Spatiotemporal information deep fusion network with frame attention mechanism for video action recognition
    Ou, Hongshi
    Sun, Jifeng
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [3] Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition
    Kim, Hyun-Woo
    Choi, Yong-Suk
    [J]. Sensors, 2024, 24 (21)
  • [4] Residual Gating Fusion Network for Human Action Recognition
    Zhang, Junxuan
    Hu, Haifeng
    [J]. BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 79 - 86
  • [5] Workout Action Recognition in Video Streams Using an Attention Driven Residual DC-GRU Network
    Dey, Arnab
    Biswas, Samit
    Le, Dac-Nhuong
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 3067 - 3087
  • [6] Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design
    Sang H.-F.
    Zhao Z.-Y.
    He D.-K.
    [J]. Zhao, Zi-Yu (Maikuraky1022@outlook.com), 1600, Chinese Institute of Electronics (48): : 1052 - 1061
  • [7] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Haiping Zhang
    Zepeng Hu
    Dongjin Yu
    Liming Guan
    Xu Liu
    Conghao Ma
    [J]. Neural Processing Letters, 56
  • [8] SDAN: Stacked Diverse Attention Network for Video Action Recognition
    Zhu, Xiaoguang
    Huang, Siran
    Fan, Wenjing
    Cheng, Yuhao
    Shao, Huaqing
    Liu, Peilin
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [9] Multipath Attention and Adaptive Gating Network for Video Action Recognition
    Zhang, Haiping
    Hu, Zepeng
    Yu, Dongjin
    Guan, Liming
    Liu, Xu
    Ma, Conghao
    [J]. NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [10] Local fusion networks with chained residual pooling for video action recognition
    He, Feixiang
    Liu, Fayao
    Yao, Rui
    Lin, Guosheng
    [J]. IMAGE AND VISION COMPUTING, 2019, 81 : 34 - 41