A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

被引:1
|
作者
Bulbul, Mohammad Farhad [1 ,2 ]
Ullah, Amin [3 ]
Ali, Hazrat [4 ]
Kim, Daijin [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, 77 Cheongam, Pohang 37673, South Korea
[2] Jashore Univ Sci & Technol, Dept Math, Jashore 7408, Bangladesh
[3] Oregon State Univ, CORIS Inst, Corvallis, OR 97331 USA
[4] Hamad Bin Khalifa Univ, Qatar Fdn, Coll Sci & Engn, POB 34110, Doha, Qatar
关键词
3D action recognition; depth map sequence; CNN; transfer learning; bi-directional LSTM; RNN; attention; BIDIRECTIONAL LSTM; FUSION; IMAGE; 2D;
D O I
10.3390/s22186841
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Fault recognition of rolling bearing with small-scale dataset based on transfer learning
    Wang, Ying
    Liang, Mingxuan
    Wu, Xiangwei
    Qian, Lijuan
    Chen, Li
    JOURNAL OF VIBROENGINEERING, 2021, 23 (05) : 1160 - 1170
  • [2] Evaluation of Small-Scale Deep Learning Architectures in Thai Speech Recognition
    Kaewprateep, Jirayu
    Prom-on, Santitham
    2018 1ST INTERNATIONAL ECTI NORTHERN SECTION CONFERENCE ON ELECTRICAL, ELECTRONICS, COMPUTER AND TELECOMMUNICATIONS ENGINEERING (ECTI-NCON, 2018, : 60 - 64
  • [3] Action Recognition with Temporal Scale-Invariant Deep Learning Framework
    Huafeng Chen
    Jun Chen
    Ruimin Hu
    Chen Chen
    Zhongyuan Wang
    中国通信, 2017, 14 (02) : 163 - 172
  • [4] Action Recognition with Temporal Scale-Invariant Deep Learning Framework
    Chen, Huafeng
    Chen, Jun
    Hu, Ruimin
    Chen, Chen
    Wang, Zhongyuan
    CHINA COMMUNICATIONS, 2017, 14 (02) : 163 - 172
  • [5] Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset
    Wu, Zhenyu
    Wang, Haotao
    Wang, Zhaowen
    Jin, Hailin
    Wang, Zhangyang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2126 - 2139
  • [6] A new framework for deep learning video based Human Action Recognition on the edge
    Cob-Parro, Antonio Carlos
    Losada-Gutierrez, Cristina
    Marron-Romera, Marta
    Gardel-Vicente, Alfredo
    Bravo-Munoz, Ignacio
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [7] Deep Learning Technology for Small-scale Data
    Ishii M.
    Sato A.
    1600, Inst. of Image Information and Television Engineers (74): : 26 - 29
  • [8] FlowerAction: a federated deep learning framework for video-based human action recognition
    Thi Quynh Khanh Dinh
    Thanh-Hai Tran
    Trung-Kien Tran
    Thi-Lan Le
    Journal of Ambient Intelligence and Humanized Computing, 2025, 16 (2) : 459 - 470
  • [9] Human action recognition on depth dataset
    Gao, Zan
    Zhang, Hua
    Liu, Anan A.
    Xu, Guangping
    Xue, Yanbing
    NEURAL COMPUTING & APPLICATIONS, 2016, 27 (07): : 2047 - 2054
  • [10] Human action recognition on depth dataset
    Zan Gao
    Hua Zhang
    Anan A. Liu
    Guangping Xu
    Yanbing Xue
    Neural Computing and Applications, 2016, 27 : 2047 - 2054