Exploiting deep residual networks for human action recognition from skeletal data

被引:46
|
作者
Huy-Hieu Pham [1 ,2 ]
Khoudour, Louandi [1 ]
Crouzil, Alain [2 ]
Zegers, Pablo [3 ]
Velastin, Sergio A. [4 ,5 ]
机构
[1] Ctr Etud & Expertise Risques Environm Mobilite &, F-31400 Toulouse, France
[2] Univ Toulouse, UPS, Inst Rech Informat Toulouse IRIT, F-31062 Toulouse 9, France
[3] Aparnix, La Gioconda 4355,10B, Santiago, Chile
[4] Univ Carlos III Madrid, Appl Artificial Intelligence Res Grp, Dept Comp Sci, Madrid 28270, Spain
[5] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
关键词
3D Action recognition; Deep residual networks; Skeletal data;
D O I
10.1016/j.cviu.2018.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB + D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB +D dataset.
引用
收藏
页码:51 / 66
页数:16
相关论文
共 50 条
  • [21] Human action recognition in videos with articulated pose information by deep networks
    M. Farrajota
    João M. F. Rodrigues
    J. M. H. du Buf
    Pattern Analysis and Applications, 2019, 22 : 1307 - 1318
  • [22] HADE: Exploiting Human Action Recognition Through Fine-Tuned Deep Learning Methods
    Karim, Misha
    Khalid, Shah
    Aleryani, Aliya
    Tairan, Nasser
    Ali, Zafar
    Ali, Farman
    IEEE ACCESS, 2024, 12 : 42769 - 42790
  • [23] Two-stream Deep Residual Learning with Fisher Criterion for Human Action Recognition
    Dinh Viet Sang
    Hoang Trung Dung
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 297 - 304
  • [24] HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition
    Liang, Wuyan
    Xu, Xiaolong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 1089 - 1103
  • [25] A Deep Attention Model for Action Recognition from Skeleton Data
    Gao, Yanbo
    Li, Chuankun
    Li, Shuai
    Cai, Xun
    Ye, Mao
    Yuan, Hui
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [26] Data Driven Sensing for Action Recognition Using Deep Convolutional Neural Networks
    Gupta, Ronak
    Anand, Prashant
    Kaushik, Vinay
    Chaudhury, Santanu
    Lall, Brejesh
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 250 - 259
  • [27] EXPLOITING LSTM STRUCTURE IN DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    He, Tianxing
    Droppo, Jasha
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5445 - 5449
  • [28] Modelling Human Body Pose for Action Recognition Using Deep Neural Networks
    Li, Chengyang
    Tong, Ruofeng
    Tang, Min
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 7777 - 7788
  • [29] Stratified pooling based deep convolutional neural networks for human action recognition
    Yu, Sheng
    Cheng, Yun
    Su, Songzhi
    Cai, Guorong
    Li, Shaozi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (11) : 13367 - 13382
  • [30] EMG Signals based Human Action Recognition via Deep Belief Networks
    Zhang, Jianhua
    Ling, Chen
    Li, Sunan
    IFAC PAPERSONLINE, 2019, 52 (19): : 271 - 276