A Hierarchical Spatio-Temporal Model for Human Activity Recognition

被引:33
|
作者
Xu, Wanru [1 ]
Miao, Zhenjiang [1 ]
Zhang, Xiao-Ping [2 ]
Tian, Yi [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing 100044, Peoples R China
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
关键词
Activity recognition; hidden conditional random field (HCRF); hierarchical structure; spatio-temporal dependencies; HIDDEN MARKOV MODEL; FRAMEWORK;
D O I
10.1109/TMM.2017.2674622
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There are two key issues in human activity recognition: spatial dependencies and temporal dependencies. Most recent methods focus on only one of them, and thus do not have sufficient descriptive power to recognize complex activity. In this paper, we propose a hierarchical spatio-temporal model (HSTM) to solve the problem by modeling spatial and temporal constraints simultaneously. The new HSTM is a two-layer hidden conditional random field (HCRF), where the bottom-layer HCRF aims at describing spatial relations in each frame and learning more discriminative representations, and the top-layer HCRF utilizes these high-level features to characterize temporal relations in the whole video sequence. The new HSTM takes advantage of the bottom layer as the building blocks for the top layer and it aggregates evidence from local to global level. A novel learning algorithm is derived to train all model parameters efficiently and its effectiveness is validated theoretically. Experimental results show that the HSTM can successfully classify human activities with higher accuracies on single-person actions (UCF) than other existing methods. More importantly, the HSTM also achieves superior performance on more practical interactions, including human-human interactional activities (UT-Interaction, BIT-Interaction, and CASIA) and human-object interactional activities (Gupta video dataset).
引用
收藏
页码:1494 / 1509
页数:16
相关论文
共 50 条
  • [1] LEARNING A HIERARCHICAL SPATIO-TEMPORAL MODEL FOR HUMAN ACTIVITY RECOGNITION
    Xu, Wanru
    Miao, Zhenjiang
    Zhang, Xiao-Ping
    Tian, Yi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 1607 - 1611
  • [2] Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
    Tian, Yi
    Kong, Yu
    Ruan, Qiuqi
    An, Gaoyun
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1748 - 1762
  • [3] Learning hierarchical spatio-temporal pattern for human activity prediction
    Ding, Wenwen
    Liu, Kai
    Cheng, Fei
    Zhang, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 35 : 103 - 111
  • [4] Spatio-Temporal Phrases for Activity Recognition
    Zhang, Yimeng
    Liu, Xiaoming
    Chang, Ming-Ching
    Ge, Weina
    Chen, Tsuhan
    COMPUTER VISION - ECCV 2012, PT III, 2012, 7574 : 707 - 721
  • [5] Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition
    Liu, Zhenyu
    Yao, Yaqiang
    Liu, Yan
    Zhu, Yuening
    Tao, Zhenchao
    Wang, Lei
    Feng, Yuhong
    IEEE ACCESS, 2020, 8 : 130340 - 130352
  • [6] Spatio-temporal Weight of Active Region for Human Activity Recognition
    Lee, Dong-Gyu
    Won, Dong-Ok
    PATTERN RECOGNITION, ACPR 2021, PT I, 2022, 13188 : 92 - 103
  • [7] Hierarchical Spatio-Temporal Context Modeling for Action Recognition
    Sun, Ju
    Wu, Xiao
    Yan, Shuicheng
    Cheong, Loong-Fah
    Chua, Tat-Seng
    Li, Jintao
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2004 - +
  • [8] Hierarchical Spatio-Temporal Representation Learning for Gait Recognition
    Wang, Lei
    Liu, Bo
    Liang, Fangfang
    Wang, Bincheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19582 - 19592
  • [9] A Bayesian hierarchical spatio-temporal rainfall model
    Mashford, John
    Song, Yong
    Wang, Q. J.
    Robertson, David
    JOURNAL OF APPLIED STATISTICS, 2019, 46 (02) : 217 - 229
  • [10] A hierarchical network model for the analysis of human spatio-temporal information processing
    Schill, K
    Baier, V
    Röhrbein, F
    Brauer, W
    HUMAN VISION AND ELECTRONIC IMAGING VI, 2001, 4299 : 615 - 621