Learning Spatiotemporal Features using 3DCNN and Convolutional LSTM for Gesture Recognition

被引:167
|
作者
Zhang, Liang [1 ]
Zhu, Guangming [1 ]
Shen, Peiyi [1 ]
Song, Juan [1 ]
Shah, Syed Afaq [2 ]
Bennamoun, Mohammed [2 ]
机构
[1] Xidian Univ, Sch Software, Xian, Shaanxi, Peoples R China
[2] Univ Western Australia, Nedlands, WA, Australia
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCVW.2017.369
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gesture recognition aims at understanding the ongoing human gestures. In this paper, we present a deep architecture to learn spatiotemporal features for gesture recognition. The deep architecture first learns 2D spatiotemporal feature maps using 3D convolutional neural networks (3DCNN) and bidirectional convolutional long-short-term-memory networks (ConvLSTM). The learnt 2D feature maps can encode the global temporal information and local spatial information simultaneously. Then, 2DCNN is utilized further to learn the higher-level spatiotemporal features from the 2D feature maps for the final gesture recognition. The spatiotemporal correlation information is kept through the whole process of feature learning. This makes the deep architecture an effective spatiotemporal feature learner. Experiments on the ChaLearn LAP large-scale isolated gesture dataset (IsoGD) and the Sheffield Kinect Gesture (SKIG) dataset demonstrate the superiority of the proposed deep architecture.
引用
收藏
页码:3120 / 3128
页数:9
相关论文
共 50 条
  • [31] Speaker Recognition Based on 3DCNN-LSTM
    Hu, ZhangFang
    Si, XingTong
    Luo, Yuan
    Tang, ShanShan
    Jian, Fang
    ENGINEERING LETTERS, 2021, 29 (02) : 463 - 470
  • [32] Dynamic Gesture Recognition Based on 3D Separable Convolutional LSTM Networks
    Zhang, Xunlei
    Tie, Yun
    Qi, Lin
    PROCEEDINGS OF 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2020), 2020, : 180 - 183
  • [33] A hybrid approach for search and rescue using 3DCNN and PSO
    Balmukund Mishra
    Deepak Garg
    Pratik Narang
    Vipul Mishra
    Neural Computing and Applications, 2021, 33 : 10813 - 10827
  • [34] Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks
    Jiang, Zhuolin
    Rozgic, Viktor
    Adali, Sancar
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 309 - 317
  • [35] Eulerian Motion Based 3DCNN Architecture for Facial Micro-Expression Recognition
    Wang, Yahui
    Ma, Huimin
    Xing, Xinpeng
    Pan, Zeyu
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 266 - 277
  • [36] Multi-class Classification of Alzheimer's Disease using 3DCNN Features and Multilayer Perceptron
    Raju, Manu
    Gopi, Varun P.
    Anitha, V. S.
    2021 SIXTH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2021, : 368 - 373
  • [37] 3DCNN predicting brain age using diffusion tensor imaging
    Yuqi Wang
    Jingxi Wen
    Jiang Xin
    Yunhao Zhang
    Hua Xie
    Yan Tang
    Medical & Biological Engineering & Computing, 2023, 61 : 3335 - 3344
  • [38] Egocentric Gesture Recognition Using 3D Convolutional Neural Networks for the Spatiotemporal Adaptation of Collaborative Robots
    Papanagiotou, Dimitris
    Senteri, Gavriela
    Manitsaris, Sotiris
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [39] Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules
    Cao, Congqi
    Zhang, Yifan
    Wu, Yi
    Lu, Hanqing
    Cheng, Jian
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3783 - 3791
  • [40] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497