Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition

被引:2
|
作者
Javed Imran
Balasubramanian Raman
机构
[1] Indian Institute of Technology Roorkee,Department of Computer Science and Engineering
关键词
Human action recognition; Deep learning; Convolutional neural network; Recurrent neural network; Multimodal fusion;
D O I
暂无
中图分类号
学科分类号
摘要
Fusion of multiple modalities from different sensors is an important area of research for multimodal human action recognition. In this paper, we conduct an in-depth study to investigate the effect of different parameters like input preprocessing, data augmentation, network architectures and model fusion so as to come up with a practical guideline for multimodal action recognition using deep learning paradigm. First, for RGB videos, we propose a novel image-based descriptor called stacked dense flow difference image (SDFDI), capable of capturing the spatio-temporal information present in a video sequence. A variety of deep 2D convolutional neural networks (CNN) are then trained to compare our SDFDI against state-of-the-art image-based representations. Second, for skeleton stream, we propose data augmentation technique based on 3D transformations so as to facilitate training a deep neural network on small datasets. We also propose a bidirectional gated recurrent unit (BiGRU) based recurrent neural network (RNN) to model skeleton data. Third, for inertial sensor data, we propose data augmentation based on jittering with white Gaussian noise along with deep a 1D-CNN network for action classification. The outputs of all these three heterogeneous networks (1D-CNN, 2D-CNN and BiGRU) are combined by a variety of model fusion approach based on score and feature fusion. Finally, in order to illustrate the efficacy of the proposed framework, we test our model on a publicly available UTD-MHAD dataset, and achieved an overall accuracy of 97.91%, which is about 4% higher than using each modality individually. We hope that the discussions and conclusions from this work will provide a deeper insight to the researchers in the related fields, and provide avenues for further studies for different multi-sensor based fusion architectures.
引用
收藏
页码:189 / 208
页数:19
相关论文
共 50 条
  • [1] Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition
    Imran, Javed
    Raman, Balasubramanian
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 189 - 208
  • [2] MULTIMODAL FEATURE FUSION MODEL FOR RGB-D ACTION RECOGNITION
    Xu Weiyao
    Wu Muqing
    Zhao Min
    Xia Ting
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [3] Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition
    Weiyao, Xu
    Muqing, Wu
    Min, Zhao
    Ting, Xia
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (17) : 19157 - 19164
  • [4] Multidomain Multimodal Fusion For Human Action Recognition Using Inertial Sensors
    Ahmad, Zeeshan
    Khan, Naimul Mefraz
    [J]. 2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), 2019, : 429 - 434
  • [5] Collaborative multimodal feature learning for RGB-D action recognition
    Kong, Jun
    Liu, Tianshan
    Jiang, Min
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 59 : 537 - 549
  • [6] Human Activity Recognition using RGB-D Sensors
    Bagate, Asmita
    Shah, Medha
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 902 - 905
  • [7] Ground Obstacle Detection Technology Based on Fusion of RGB-D and Inertial Sensors
    He, Jian
    Liu, Xinyuan
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (02): : 254 - 263
  • [8] Viewpoint Invariant RGB-D Human Action Recognition
    Liu, Jian
    Akhtar, Naveed
    Mian, Ajmal
    [J]. 2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 261 - 268
  • [9] MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos
    Yu, Bruce X. B.
    Liu, Yan
    Zhang, Xiang
    Zhong, Sheng-hua
    Chan, Keith C. C.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3522 - 3538
  • [10] Child Action Recognition in RGB and RGB-D Data
    Turarova, Aizada
    Zhanatkyzy, Aida
    Telisheva, Zhansaule
    Sabyrov, Arman
    Sandygulova, Anara
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 491 - 492