SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition

被引:1
|
作者
Lu, Xuemin [1 ,2 ]
Quan, Wei [2 ]
Marek, Reformat [3 ]
Zhao, Haiquan [2 ]
Chen, Jim X. X. [4 ]
机构
[1] Southwest China Inst Elect Technol, Chengdu 610036, Peoples R China
[2] Southwest Jiaotong Univ, Sch Elect Engn, Chengdu 610031, Sichuan, Peoples R China
[3] Univ Alberta, Sch Elect & Comp Engn, Edmonton, AB T6G 1H9, Canada
[4] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
来源
VISUAL COMPUTER | 2024年 / 40卷 / 05期
基金
中国国家自然科学基金;
关键词
Video action recognition; Siamese network; Spatio-temporal features; Spatial-motion awareness; Temporal-motion awareness; VECTOR;
D O I
10.1007/s00371-023-03018-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper proposes a Siamese motion-aware Spatio-temporal network (SiamMAST) for video action recognition. The SiamMAST is designed based on the fusion of four features via processing video frames: spatial features, temporal features, spatial dynamic features, and temporal dynamic features of a moving target. The SiamMAST comprises AlexNets as the backbone, LSTMs, and the spatial motion-awareness and temporal motion-awareness sub-modules. RGB images are fed into the network, where AlexNets extract spatial features. Further, they are fed into LSTMs to generate temporal features. Additionally, spatial motion-awareness and temporal motion-awareness sub-modules are proposed to capture spatial and temporal dynamic features. Finally, all features are fused and fed into the classification layer. The final recognition result is produced by averaging the test label probabilities across a fixed number of RGB frames and selecting the label of the highest probability. The whole network is trained offline using an end-to-end approach with large-scale image datasets using the standard SGD algorithm with back-propagation. The proposed network is evaluated on two challenging datasets UCF101 (93.53%) and HMDB51 (69.36%). The experiments have demonstrated the effectiveness and efficiency of our proposed SiamMAST.
引用
收藏
页码:3163 / 3181
页数:19
相关论文
共 50 条
  • [1] SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition
    Xuemin Lu
    Wei Quan
    Reformat Marek
    Haiquan Zhao
    Jim X. Chen
    [J]. The Visual Computer, 2024, 40 : 3163 - 3181
  • [2] A motion-aware ConvLSTM network for action recognition
    Mahshid Majd
    Reza Safabakhsh
    [J]. Applied Intelligence, 2019, 49 : 2515 - 2521
  • [3] A motion-aware ConvLSTM network for action recognition
    Majd, Mahshid
    Safabakhsh, Reza
    [J]. APPLIED INTELLIGENCE, 2019, 49 (07) : 2515 - 2521
  • [4] SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
    Zhang, Hongcheng
    Zhao, Xu
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2180 - 2184
  • [5] Motion-Aware Structured Light Using Spatio-Temporal Decodable Patterns
    Taguchi, Yuichi
    Agrawal, Amit
    Tuzel, Oncel
    [J]. COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 : 832 - 845
  • [6] A Spatio-Temporal Motion Network for Action Recognition Based on Spatial Attention
    Yang, Qi
    Lu, Tongwei
    Zhou, Huabing
    [J]. ENTROPY, 2022, 24 (03)
  • [7] MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module
    Zhang, Yi
    [J]. SENSORS, 2022, 22 (17)
  • [8] Motion-Aware Temporal Coherence for Video Resizing
    Wang, Yu-Shuen
    Fu, Hongbo
    Sorkine, Olga
    Lee, Tong-Yee
    Seidel, Hans-Peter
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (05): : 1 - 10
  • [9] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [10] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123