A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset

被引:1
|
作者
Bulbul, Mohammad Farhad [1 ,2 ]
Ullah, Amin [3 ]
Ali, Hazrat [4 ]
Kim, Daijin [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Dept Comp Sci & Engn, 77 Cheongam, Pohang 37673, South Korea
[2] Jashore Univ Sci & Technol, Dept Math, Jashore 7408, Bangladesh
[3] Oregon State Univ, CORIS Inst, Corvallis, OR 97331 USA
[4] Hamad Bin Khalifa Univ, Qatar Fdn, Coll Sci & Engn, POB 34110, Doha, Qatar
关键词
3D action recognition; depth map sequence; CNN; transfer learning; bi-directional LSTM; RNN; attention; BIDIRECTIONAL LSTM; FUSION; IMAGE; 2D;
D O I
10.3390/s22186841
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Learning Motion Features from Dynamic Images of Depth Video for Human Action Recognition
    Huang, Yao
    Yang, Jianyu
    Shao, Zhanpeng
    Li, Youfu
    2021 27TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2021,
  • [42] Learning multi-temporal-scale deep information for action recognition
    Guangle Yao
    Tao Lei
    Jiandan Zhong
    Ping Jiang
    Applied Intelligence, 2019, 49 : 2017 - 2029
  • [43] Learning multi-temporal-scale deep information for action recognition
    Yao, Guangle
    Lei, Tao
    Zhong, Jiandan
    Jiang, Ping
    APPLIED INTELLIGENCE, 2019, 49 (06) : 2017 - 2029
  • [44] Object and Human Action Recognition From Video Using Deep Learning Models
    Soentanto, Padmeswari Nandiya
    Hendryli, Janson
    Herwindiati, Dyah E.
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 45 - 49
  • [45] Timed-image based deep learning for action recognition in video sequences
    Atto, Abdourrahmane Mahamane
    Benoit, Alexandre
    Lambert, Patrick
    PATTERN RECOGNITION, 2020, 104
  • [46] Unsupervised Deep Learning of Mid-Level Video Representation for Action Recognition
    Hou, Jingyi
    Wu, Xinxiao
    Chen, Jin
    Luo, Jiebo
    Jia, Yunde
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6910 - 6917
  • [47] Video Analytics Framework for Human Action Recognition
    Khan, Muhammad Attique
    Alhaisoni, Majed
    Armghan, Ammar
    Alenezi, Fayadh
    Tariq, Usman
    Nam, Yunyoung
    Akram, Tallha
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (03): : 3841 - 3859
  • [48] Video dataset of Balinese dance basic movement for action recognition
    Hendrawan, I. Nyoman Rudy
    Setyarini, Putu
    Permana, Putu Andika Tedja
    Hermanto, I. Made
    Putra, I. Gusti Agung Putu Dharma
    Maharani, Anak Agung Ayu Citra
    DATA IN BRIEF, 2024, 53
  • [49] Drone-Action: An Outdoor Recorded Drone Video Dataset for Action Recognition
    Perera, Asanka G.
    Law, Yee Wei
    Chahl, Javaan
    DRONES, 2019, 3 (04) : 1 - 16
  • [50] Deep learning with soft attention mechanism for small-scale ground roll attenuation
    Yang, Liuqing
    Fomel, Sergey
    Wang, Shoudong
    Chen, Xiaohong
    Chen, Yangkang
    GEOPHYSICS, 2024, 89 (01) : WA179 - WA193