Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition

被引:254
|
作者
Kovashka, Adriana [1 ]
Grauman, Kristen [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
关键词
D O I
10.1109/CVPR.2010.5539881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure "bag-of-words" model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.
引用
收藏
页码:2046 / 2053
页数:8
相关论文
共 50 条
  • [1] Contextual Statistics of Space-Time Ordered Features for Human Action Recognition
    Bilinski, Piotr
    Bremond, Francois
    [J]. 2012 IEEE NINTH INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL-BASED SURVEILLANCE (AVSS), 2012, : 228 - 233
  • [2] Space-Time Neighborhood Based Hierarchical Descriptor for Action Recognition
    Wang, Haoran
    Yuan, Chunfeng
    Hu, Weiming
    Sun, Changyin
    [J]. 2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 95 - 99
  • [3] 3D Pooling on Local Space-time Features for Human Action Recognition
    Hadibarhaghtalab, Najme
    Azimifar, Zohreh
    [J]. 2013 8TH IRANIAN CONFERENCE ON MACHINE VISION & IMAGE PROCESSING (MVIP 2013), 2013, : 266 - 269
  • [4] Learning Discriminative Space-Time Action Parts from Weakly Labelled Videos
    Sapienza, Michael
    Cuzzolin, Fabio
    Torr, Philip H. S.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 110 (01) : 30 - 47
  • [5] Action Recognition Using Discriminative Spatio-Temporal Neighborhood Features
    Cheng, Shi-Lei
    Yang, Jiang-Feng
    Ma, Zheng
    Xie, Mei
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND INFORMATION SECURITY (CNIS 2015), 2015, : 166 - 172
  • [6] Space-time shapelets for action recognition
    Batra, Dhruv
    Chen, Tsuhan
    Sukthankar, Rahul
    [J]. 2008 IEEE WORKSHOP ON MOTION AND VIDEO COMPUTING, 2008, : 161 - 166
  • [7] Learning Discriminative Convolutional Features for Skeletal Action Recognition
    Xu, Jinhua
    Xiang, Yang
    Hu, Lizhang
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 564 - 574
  • [8] A Hierarchical Bag-of-Words Model Based on Local Space-Time Features for Human Action Recognition
    Wu, Jiangwei
    Zhou, Daobing
    Xiao, Guoqiang
    [J]. 2013 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2013,
  • [9] Space-Time Tree Ensemble for Action Recognition
    Ma, Shugao
    Sigal, Leonid
    Sclaroff, Stan
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5024 - 5032
  • [10] Learning Discriminative Visual Codebook for Human Action Recognition
    Lei, Qing
    Li, Shao-zi
    Zhang, Hong-bo
    [J]. JOURNAL OF COMPUTERS, 2013, 8 (12) : 3093 - 3102