Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences

被引：25

作者：

Ji, Xiaopeng ^{[1
,2
]}

Zhao, Qingsong ^{[2
,5
]}

Cheng, Jun ^{[3
,4
]}

Ma, Chenfei ^{[2
,6
]}

机构：

[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China

[2] Shenzhen Inst Adv Technol, Chinese Acad Sci, CAS Key Lab Human Machine Intelligence Synergy Sy, Shenzhen, Peoples R China

[3] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Hong Kong Macao Joint Lab Human Machine, Shenzhen, Peoples R China

[4] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[5] Tongji Univ, Sch Elect & Informat Engn, Shanghai, Peoples R China

[6] Northeastern Univ, Coll Med & Biol Informat Engn, Boston, MA 02115 USA

来源：

KNOWLEDGE-BASED SYSTEMS | 2021年 / 227卷

基金：

中国国家自然科学基金;

关键词：

3D human action recognition; Depth map sequences; Short-term modeling; Depth-oriented gradient vector;

D O I：

10.1016/j.knosys.2021.107040

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human action recognition based on 3D data is attracting increasing attention because it could provide more abundant spatial and temporal information compared with RGB videos. The challenge of the depth map based method is to capture the cues between spatial appearances and temporal motions. In this paper, we propose a straightforward and efficient framework for modeling the human action based on depth map sequences, considering the short-term and long-term dependencies. A frame-level feature, termed depth-oriented gradient vector (DOGV), is developed to capture the appearance and motion in a short-term duration. For a long-term dependence, we construct convolutional neural networks (CNNs) based backbone to aggregate frame-level features in the space and time sequence. The proposed method is comprehensively evaluated on four public benchmark datasets, including NTU RGB+D, NTU RGB+D 120, PKU-MMD and UOW LSC. The experimental results demonstrate that the proposed approach can solve the problem of 3D human action recognition in an efficient way and achieve the state-of-the-art performance. (C) 2021 Elsevier B.V. All rights reserved.

引用

页数：11

共 50 条

[1] Sparse Spatio-Temporal Representation of Joint Shape-Motion Cues for Human Action Recognition in Depth Sequences
Tran, Quang D.
Ly, Ngoc Q.
[J]. PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 253 - 258
[2] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
Liu, Jun
Shahroudy, Amir
Xu, Dong
Wang, Gang
[J]. COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 816 - 833
[3] Spatio-Temporal Denoising for Depth Map Sequences
Hach, Thomas
Seybold, Tamara
[J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2016, 7 (02): : 21 - 35
[4] Spatio-temporal attention on manifold space for 3D human action recognition
Ding, Chongyang
Liu, Kai
Cheng, Fei
Belyaev, Evgeny
[J]. APPLIED INTELLIGENCE, 2021, 51 (01) : 560 - 570
[5] 3D human action recognition using spatio-temporal motion templates
Lv, FJ
Nevatia, R
Lee, MW
[J]. COMPUTER VISION IN HUMAN-COMPUTER INTERACTION, PROCEEDINGS, 2005, 3766 : 120 - 130
[6] Spatio-temporal attention on manifold space for 3D human action recognition
Chongyang Ding
Kai Liu
Fei Cheng
Evgeny Belyaev
[J]. Applied Intelligence, 2021, 51 : 560 - 570
[7] Augmenting Spatio-Temporal Human Motion Data for Effective 3D Action Recognition
Sedmidubsky, Jan
Zezula, Pavel
[J]. 2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 204 - 207
[8] Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
Tian, Yi
Kong, Yu
Ruan, Qiuqi
An, Gaoyun
Fu, Yun
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1748 - 1762
[9] SPATIO-TEMPORAL PYRAMIDAL ACCORDION REPRESENTATION FOR HUMAN ACTION RECOGNITION
Sekma, Manel
Mejdoub, Mahmoud
Ben Amar, Chokri
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] Spatio-temporal deformable 3D ConvNets with attention for action recognition
Li, Jun
Liu, Xianglong
Zhang, Mingyuan
Wang, Deqing
[J]. PATTERN RECOGNITION, 2020, 98 (98)

← 1 2 3 4 5 →