Action recognition by spatio-temporal oriented energies

被引:45
|
作者
Zhen, Xiantong [1 ,2 ]
Shao, Ling [1 ,2 ]
Li, Xuelong [3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Coll Elect & Informat Engn, Nanjing 210044, Jiangsu, Peoples R China
[2] Univ Sheffield, Dept Elect & Elect Engn, Sheffield S1 3JD, S Yorkshire, England
[3] Chinese Acad Sci, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Xian 710119, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Steerable filters; Spatio-temporal oriented energies; Spatio-temporal Laplacian pyramid; MODELS;
D O I
10.1016/j.ins.2014.05.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a unified representation based on the spatio-temporal steerable pyramid (STSP) for the holistic representation of human actions. A video sequence is viewed as a spatio-temporal volume preserving all the appearance and motion information of an action in it. By decomposing the spatio-temporal volumes into band-passed sub-volumes, the spatio-temporal Laplacian pyramid provides an effective technique for multi-scale analysis of video sequences, and spatio-temporal patterns with different scales could be well localized and captured. To efficiently explore the underlying local spatio-temporal orientation structures at multiple scales, a bank of three-dimensional separable steerable filters are conducted on each of the sub-volume from the Laplacian pyramid. The outputs of the quad-rature pair of steerable filters are squared and summed to yield a more robust oriented energy representation. To be further invariant and compact, a spatio-temporal max pooling operation is performed between responses of the filtering at adjacent scales and over spatio-temporal neighbourhoods. In order to capture the appearance, local geometric structure and motion of an action, we apply the STSP on the intensity, 3D gradients and optical flow of video sequences, yielding a unified holistic representation of human actions. Taking advantage of multi-scale, multi-orientation analysis and feature pooling, STSP produces a compact but informative and invariant representation of human actions. We conduct extensive experiments on the KTH, UCF Sports and HMDB51 datasets, which shows the unified STSP achieves comparable results with the state-of-the-art methods. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:295 / 309
页数:15
相关论文
共 50 条
  • [1] Action Recognition Based on Histogram of Spatio-Temporal Oriented Principal Components
    Xu Haiyang
    Kong Jun
    Jiang Min
    Zan Baofeng
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (06)
  • [2] Action Recognition with Multiscale Spatio-Temporal Contexts
    Wang, Jiang
    Chen, Zhuoyuan
    Wu, Ying
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [3] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [4] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    [J]. 2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [5] Spatio-temporal information for human action recognition
    Yao, Li
    Liu, Yunjian
    Huang, Shihui
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,
  • [6] Spatio-temporal information for human action recognition
    Li Yao
    Yunjian Liu
    Shihui Huang
    [J]. EURASIP Journal on Image and Video Processing, 2016
  • [7] Spatio-Temporal Fusion Networks for Action Recognition
    Cho, Sangwoo
    Foroosh, Hassan
    [J]. COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 347 - 364
  • [8] Spatio-Temporal Steerable Pyramid for Human Action Recognition
    Zhen, Xiantong
    Shao, Ling
    [J]. 2013 10TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), 2013,
  • [9] Spatio-temporal Video Autoencoder for Human Action Recognition
    Sousa e Santos, Anderson Carlos
    Pedrini, Helio
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 114 - 123
  • [10] Projection transform on spatio-temporal context for action recognition
    Wanru Xu
    Zhenjiang Miao
    Qiang Zhang
    [J]. Multimedia Tools and Applications, 2015, 74 : 7711 - 7728