Dynamic Gesture Recognition Network Based on Multiscale Spatiotemporal Feature Fusion

被引:2
|
作者
Liu, Jie [1 ]
Wang, Yue [1 ]
Tian, Ming [2 ]
机构
[1] Harbin Univ Sci & Technol, Harbin 150080, Peoples R China
[2] China Telecom Heilongjiang Branch, Harbin 150040, Peoples R China
关键词
Dynamic gesture recognition; Deep learning; Convolutional vision Transformer (CvT); Multiscale fusion;
D O I
10.11999/JEIT220758
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Because of the time complexity and space complexity of dynamic gesture data, traditional machine learning algorithms are difficult to extract accurate gesture features; The existing dynamic gesture recognition algorithms have complex network design, large amount of parameters and insufficient gesture feature extraction. To solve the above problems, a multiscale spatiotemporal feature fusion network based on Convolutional vision Transformer(CvT)is proposed. Firstly, the CvT network used in the field of image classification is introduced into the field of dynamic gesture classification. The CvT network is used to extract the spatial features of a single gesture image, and fuse the shallow features and deep features of different spatial scales. Secondly, a multi time scale aggregation module is designed to extract the spatio-temporal features of dynamic gestures. The CvT network is combined with the multi time scale aggregation module to suppress invalid features. Finally, in order to make up for the deficiency of dropout layer in CvT network, r-drop model is applied to multi-scale spatiotemporal feature fusion network. The experimental results on Jester dataset show that the proposed method is superior to the existing dynamic gesture recognition methods in recognition rate, and the recognition rate on Jester dataset reaches 92.26%.
引用
收藏
页码:2614 / 2622
页数:9
相关论文
共 22 条
  • [1] A survey on deep learning based approaches for action and gesture recognition in image sequences
    Asadi-Aghbolaghi, Maryam
    Clapes, Albert
    Bellantonio, Marco
    Escalante, Hugo Jair
    Ponce-Lopez, Victor
    Baro, Xavier
    Guyon, Isabelle
    Kasaei, Shohreh
    Escalera, Sergio
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 476 - 483
  • [2] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [3] SlowFast Networks for Video Recognition
    Feichtenhofer, Christoph
    Fan, Haoqi
    Malik, Jitendra
    He, Kaiming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6201 - 6210
  • [4] A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2
    Gao, Zan
    Guo, Leming
    Guan, Weili
    Liu, Anan
    Ren, Tongwei
    Chen, Shengyong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 767 - 782
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] [胡凯 Hu Kai], 2021, [重庆邮电大学学报. 自然科学版, Journal of Chongqing University of Posts and Telecommunications. Natural Science Edition], V33, P970
  • [7] 3D Convolutional Neural Networks for Human Action Recognition
    Ji, Shuiwang
    Xu, Wei
    Yang, Ming
    Yu, Kai
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 221 - 231
  • [8] Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled
    Koller, Oscar
    Ney, Hermann
    Bowden, Richard
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3793 - 3802
  • [9] LIANG Xiaobo, 2021, P THIRTYFIFTH C NEUR
  • [10] 3D-based Deep Convolutional Neural Network for action recognition with depth sequences
    Liu, Zhi
    Zhang, Chenyang
    Tian, Yingli
    [J]. IMAGE AND VISION COMPUTING, 2016, 55 : 93 - 100