A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

被引:1
|
作者
Bulbul, Mohammad Farhad [1 ,2 ]
Ullah, Amin [3 ]
Ali, Hazrat [4 ]
Kim, Daijin [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, 77 Cheongam, Pohang 37673, South Korea
[2] Jashore Univ Sci & Technol, Dept Math, Jashore 7408, Bangladesh
[3] Oregon State Univ, CORIS Inst, Corvallis, OR 97331 USA
[4] Hamad Bin Khalifa Univ, Qatar Fdn, Coll Sci & Engn, POB 34110, Doha, Qatar
关键词
3D action recognition; depth map sequence; CNN; transfer learning; bi-directional LSTM; RNN; attention; BIDIRECTIONAL LSTM; FUSION; IMAGE; 2D;
D O I
10.3390/s22186841
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] An Improved Deep Learning Model of Chili Disease Recognition with Small Dataset
    Aminuddin, Nuramin Fitri
    Tukiran, Zarina
    Joret, Ariffuddin
    Tomari, Razali
    Morsin, Marlia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 407 - 412
  • [22] Multi-patch integrated learning dehazing network for small-scale dataset
    Zhao, Ruini
    Han, Yi
    Liu, Ming
    Wang, Lujia
    Liu, Jianwei
    Zhang, Ping
    Gu, Yulei
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (05)
  • [23] Depth MHI Based Deep Learning Model for Human Action Recognition
    Gu, Ye
    Ye, Xiaofeng
    Sheng, Weihua
    2018 13TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2018, : 395 - 400
  • [24] Multi-feature consultation model for human action recognition in depth video sequence
    Liu, Xueping
    Li, Yibo
    Li, Xiaoming
    Tian, Can
    Yang, Yueqi
    JOURNAL OF ENGINEERING-JOE, 2018, (16): : 1498 - 1502
  • [25] Simulating small-scale dynamo action in cool main-sequence stars
    Riva, Fabio
    Steiner, Oskar
    Freytag, Bernd
    ASTRONOMY & ASTROPHYSICS, 2024, 684
  • [26] Multi-dimensional data modelling of video image action recognition and motion capture in deep learning framework
    Gao, Peijun
    Zhao, Dan
    Chen, Xuanang
    IET IMAGE PROCESSING, 2020, 14 (07) : 1257 - 1264
  • [27] Video-based driver action recognition via hybrid spatial-temporal deep learning framework
    Hu, Yaocong
    Lu, Mingqi
    Xie, Chao
    Lu, Xiaobo
    MULTIMEDIA SYSTEMS, 2021, 27 (03) : 483 - 501
  • [28] Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation
    Mei, Tao
    PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 1 - 1
  • [29] HybridHR-Net: Action Recognition in Video Sequences Using Optimal Deep Learning Fusion Assisted Framework
    Akbar, Muhammad Naeem
    Khan, Seemab
    Farooq, Muhammad Umar
    Alhaisoni, Majed
    Tariq, Usman
    Akram, Andmuhammad Usman
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (03): : 3275 - 3295
  • [30] Evaluating the Feasibility of Deep Learning for Action Recognition in Small Datasets
    Monteiro, Juarez
    Granada, Roger
    Aires, Joao Paulo
    Barros, Rodrigo C.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,