Video Action Recognition Based on Spatio-temporal Feature Pyramid Module

被引:0
|
作者
Gong, Suming [1 ]
Chen, Ying [1 ]
机构
[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Dilated convolution; Spatiotemporal feature pyramid;
D O I
10.1109/ISCID51228.2020.00082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modeling the spatio-temporal information of different actions facilitates their recognition. The mainstream 2D convolutional network has low computational cost but cannot capture timing information; the mainstream 3D convolutional network can extract spatio-temporal features but has a huge amount of calculation and is difficult to deploy. In this paper, a Spatiotemporal Feature Pyramid Module(STFPM) is proposed to extract spatio-temporal feature information. STFPM captures temporal information between frames by dilated convolution and fuses feature information by weighted addition. STFPM can be flexibly inserted into the 2D backbone network in a plug-and-play manner. When equipped with STFPM, 2D ResNet-50 achieves good results on UCF101 dataset and HMDB51 dataset.
引用
收藏
页码:338 / 341
页数:4
相关论文
共 50 条
  • [1] Spatio-Temporal Steerable Pyramid for Human Action Recognition
    Zhen, Xiantong
    Shao, Ling
    [J]. 2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [2] Spatio-Temporal Pyramid Model Based on Depth Maps for Action Recognition
    Xu, Haining
    Chen, Enqing
    Liang, Chengwu
    Qi, Lin
    Guan, Ling
    [J]. 2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [3] Spatio-Temporal Laplacian Pyramid Coding for Action Recognition
    Shao, Ling
    Zhen, Xiantong
    Tao, Dacheng
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (06) : 817 - 827
  • [4] Human Action Recognition Based on a Spatio-Temporal Video Autoencoder
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (11)
  • [5] Spatio-Temporal Collaborative Module for Efficient Action Recognition
    Hao, Yanbin
    Wang, Shuo
    Tan, Yi
    He, Xiangnan
    Liu, Zhenguang
    Wang, Meng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7279 - 7291
  • [6] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
  • [7] Exploiting spatio-temporal knowledge for video action recognition
    Zhang, Huigang
    Wang, Liuan
    Sun, Jun
    [J]. IET COMPUTER VISION, 2023, 17 (02) : 222 - 230
  • [8] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [9] STSM: Spatio-Temporal Shift Module for Efficient Action Recognition
    Yang, Zhaoqilin
    An, Gaoyun
    Zhang, Ruichen
    [J]. MATHEMATICS, 2022, 10 (18)
  • [10] A spatio-temporal pyramid matching for video retrieval
    Choi, Jaesik
    Wang, Ziyu
    Lee, Sang-Chul
    Jeon, Won J.
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (06) : 660 - 669