Efficient spatio-temporal network for action recognition

被引:0
|
作者
Su, Yanxiong [1 ]
Zhao, Qian [1 ]
机构
[1] Shanghai Univ Elect Power, Coll Elect & Informat Engn, Shanghai 201306, Peoples R China
关键词
Spatio-temporal feature; Motion feature learning; Video action recognition; Channel feature;
D O I
10.1007/s11554-024-01541-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The input tensor of video data includes temporal, spatial, and channel dimensions, crucial for extracting complementary spatial, temporal, and spatio-temporal features for video action recognition. To efficiently extract and integrate these features, we propose an efficient spatio-temporal module (ESTM) with three pathways dedicated to extracting spatial, temporal, and spatio-temporal features. Each pathway uses the Cross Global Average Pooling (CGAP) module to compress the current dimension, focusing features on the remaining two dimensions. This enhances feature extraction and recognition rates for complex actions. We also introduce a Motion Excitation Module (MEM) to enrich input features by transforming correlations between adjacent frames, reducing computational complexity. Finally, ESTM and MEM are seamlessly integrated into a 2D CNN, forming the efficient spatio-temporal network (ESTN), with minimal impact on network parameters and computational costs. Extensive experiments show that ESTN outperforms state-of-the-art methods on datasets like Something V1 & V2 and HMDB51, validating its effectiveness.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] ACTION RECOGNITION USING SPATIO-TEMPORAL DIFFERENTIAL MOTION
    Yadav, Gaurav Kumar
    Sethi, Amit
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3415 - 3419
  • [32] Spatio-temporal shape and flow correlation for action recognition
    Ke, Yan
    Sukthankar, Rahul
    Hebert, Martial
    [J]. 2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 3835 - +
  • [33] Spatio-Temporal Laplacian Pyramid Coding for Action Recognition
    Shao, Ling
    Zhen, Xiantong
    Tao, Dacheng
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (06) : 817 - 827
  • [34] Interpretable Spatio-temporal Attention for Video Action Recognition
    Meng, Lili
    Zhao, Bo
    Chang, Bo
    Huang, Gao
    Sun, Wei
    Tung, Frederich
    Sigal, Leonid
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1513 - 1522
  • [35] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [36] Spatio-temporal Contrastive Domain Adaptation for Action Recognition
    Song, Xiaolin
    Zhao, Sicheng
    Yang, Jingyu
    Yue, Huanjing
    Xu, Pengfei
    Hu, Runbo
    Chai, Hua
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9782 - 9790
  • [37] Hierarchical Spatio-Temporal Context Modeling for Action Recognition
    Sun, Ju
    Wu, Xiao
    Yan, Shuicheng
    Cheong, Loong-Fah
    Chua, Tat-Seng
    Li, Jintao
    [J]. CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2004 - +
  • [38] Human Action Recognition Using Spatio-temporal Classification
    Fang, Chin-Hsien
    Chen, Ju-Chin
    Tseng, Chien-Chung
    Lien, Jenn-Jier James
    [J]. COMPUTER VISION - ACCV 2009, PT II, 2010, 5995 : 98 - 109
  • [39] Human Action Recognition Based on Spatio-temporal Features
    Sawant, Nikhil
    Biswas, K. K.
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 357 - 362
  • [40] Skeleton Action Recognition Based on Spatio-temporal Feature Enhanced Graph Convolutional Network
    Cao, Yi
    Wu, Weiguan
    Li, Ping
    Xia, Yu
    Gao, Qingyuan
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 3022 - 3031