Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences

被引:25
|
作者
Ji, Xiaopeng [1 ,2 ]
Zhao, Qingsong [2 ,5 ]
Cheng, Jun [3 ,4 ]
Ma, Chenfei [2 ,6 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Shenzhen Inst Adv Technol, Chinese Acad Sci, CAS Key Lab Human Machine Intelligence Synergy Sy, Shenzhen, Peoples R China
[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen, Peoples R China
[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[5] Tongji Univ, Sch Elect & Informat Engn, Shanghai, Peoples R China
[6] Northeastern Univ, Coll Med & Biol Informat Engn, Boston, MA 02115 USA
基金
中国国家自然科学基金;
关键词
3D human action recognition; Depth map sequences; Short-term modeling; Depth-oriented gradient vector;
D O I
10.1016/j.knosys.2021.107040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition based on 3D data is attracting increasing attention because it could provide more abundant spatial and temporal information compared with RGB videos. The challenge of the depth map based method is to capture the cues between spatial appearances and temporal motions. In this paper, we propose a straightforward and efficient framework for modeling the human action based on depth map sequences, considering the short-term and long-term dependencies. A frame-level feature, termed depth-oriented gradient vector (DOGV), is developed to capture the appearance and motion in a short-term duration. For a long-term dependence, we construct convolutional neural networks (CNNs) based backbone to aggregate frame-level features in the space and time sequence. The proposed method is comprehensively evaluated on four public benchmark datasets, including NTU RGB+D, NTU RGB+D 120, PKU-MMD and UOW LSC. The experimental results demonstrate that the proposed approach can solve the problem of 3D human action recognition in an efficient way and achieve the state-of-the-art performance. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Sparse Spatio-Temporal Representation of Joint Shape-Motion Cues for Human Action Recognition in Depth Sequences
    Tran, Quang D.
    Ly, Ngoc Q.
    [J]. PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 253 - 258
  • [2] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
    Liu, Jun
    Shahroudy, Amir
    Xu, Dong
    Wang, Gang
    [J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 816 - 833
  • [3] Spatio-Temporal Denoising for Depth Map Sequences
    Hach, Thomas
    Seybold, Tamara
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2016, 7 (02): : 21 - 35
  • [4] Spatio-temporal attention on manifold space for 3D human action recognition
    Ding, Chongyang
    Liu, Kai
    Cheng, Fei
    Belyaev, Evgeny
    [J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 560 - 570
  • [5] 3D human action recognition using spatio-temporal motion templates
    Lv, FJ
    Nevatia, R
    Lee, MW
    [J]. COMPUTER VISION IN HUMAN-COMPUTER INTERACTION, PROCEEDINGS, 2005, 3766 : 120 - 130
  • [6] Spatio-temporal attention on manifold space for 3D human action recognition
    Chongyang Ding
    Kai Liu
    Fei Cheng
    Evgeny Belyaev
    [J]. Applied Intelligence, 2021, 51 : 560 - 570
  • [7] Augmenting Spatio-Temporal Human Motion Data for Effective 3D Action Recognition
    Sedmidubsky, Jan
    Zezula, Pavel
    [J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 204 - 207
  • [8] Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
    Tian, Yi
    Kong, Yu
    Ruan, Qiuqi
    An, Gaoyun
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1748 - 1762
  • [9] SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION
    Sekma, Manel
    Mejdoub, Mahmoud
    Ben Amar, Chokri
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] Spatio-temporal deformable 3D ConvNets with attention for action recognition
    Li, Jun
    Liu, Xianglong
    Zhang, Mingyuan
    Wang, Deqing
    [J]. PATTERN RECOGNITION, 2020, 98 (98)