ESTI: an action recognition network with enhanced spatio-temporal information

被引:1
|
作者
Jiang, ZhiYu [1 ]
Zhang, Yi [1 ]
Hu, Shu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610000, Peoples R China
关键词
Action recognition; Feature enhancement; Global multi-scale feature; Local motion extraction; Spatio-temporal information;
D O I
10.1007/s13042-023-01820-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is an active topic in video understanding, which aims to recognize human actions in videos. The critical step is to model the spatio-temporal information and extract key action clues. To this end, we propose a simple and efficient network (dubbed ESTI) which consists of two core modules. The Local Motion Extraction module highlights the short-term temporal context. While the Global Multi-scale Feature Enhancement module strengthens the spatio-temporal and channel features to model long-term information. By appending ESTI to a 2D ResNet backbone, our network is capable of reasoning different kinds of actions with various amplitudes in videos. Our network is developed under two Geforce RTX 3090 using Python3.7/Pytorch1.8. Extensive experiments have been conducted on 5 mainstream datasets to verify the effectiveness of our network, in which ESTI outperforms most of the state-of-the-arts methods in terms of accuracy, computational cost and network scale. Besides, we also visualize the feature representation of our model by using Grad-Cam to validate its accuracy.
引用
收藏
页码:3059 / 3070
页数:12
相关论文
共 50 条
  • [21] Resstanet: deep residual spatio-temporal attention network for violent action recognition
    Pandey A.
    Kumar P.
    International Journal of Information Technology, 2024, 16 (5) : 2891 - 2900
  • [22] Stme-net: spatio-temporal motion excitation network for action recognition
    Qian Zhao
    Yanxiong Su
    Hui Zhang
    Journal of Real-Time Image Processing, 2025, 22 (2)
  • [23] Action recognition method of spatio-temporal feature fusion deep learning network
    Pei, Xiaomin
    Fan, Huijie
    Tang, Yandong
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (02):
  • [24] Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature
    Indhumathi, C.
    Murugan, V
    Muthulakshmii, G.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2022, 22 (05)
  • [25] Spatio-Temporal Steerable Pyramid for Human Action Recognition
    Zhen, Xiantong
    Shao, Ling
    2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [26] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
  • [27] Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition
    Sanin, Andres
    Sanderson, Conrad
    Harandi, Mehrtash T.
    Lovell, Brian C.
    2013 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION (WACV), 2013, : 103 - 110
  • [28] Projection transform on spatio-temporal context for action recognition
    Wanru Xu
    Zhenjiang Miao
    Qiang Zhang
    Multimedia Tools and Applications, 2015, 74 : 7711 - 7728
  • [29] Spatio-Temporal Grids for Daily Living Action Recognition
    Das, Srijan
    Sakhalkar, Kaustubh
    Koperski, Michal
    Bremond, Francois
    ELEVENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2018), 2018,
  • [30] Clustered Spatio-Temporal Manifolds for Online Action Recognition
    Bloom, Victoria
    Makris, Dimitrios
    Argyriou, Vasileios
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3963 - 3968