Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition

被引：2

作者：

Javed Imran

Balasubramanian Raman

机构：

[1] Indian Institute of Technology Roorkee,Department of Computer Science and Engineering

来源：

Journal of Ambient Intelligence and Humanized Computing | 2020年 / 11卷

关键词：

Human action recognition; Deep learning; Convolutional neural network; Recurrent neural network; Multimodal fusion;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Fusion of multiple modalities from different sensors is an important area of research for multimodal human action recognition. In this paper, we conduct an in-depth study to investigate the effect of different parameters like input preprocessing, data augmentation, network architectures and model fusion so as to come up with a practical guideline for multimodal action recognition using deep learning paradigm. First, for RGB videos, we propose a novel image-based descriptor called stacked dense flow difference image (SDFDI), capable of capturing the spatio-temporal information present in a video sequence. A variety of deep 2D convolutional neural networks (CNN) are then trained to compare our SDFDI against state-of-the-art image-based representations. Second, for skeleton stream, we propose data augmentation technique based on 3D transformations so as to facilitate training a deep neural network on small datasets. We also propose a bidirectional gated recurrent unit (BiGRU) based recurrent neural network (RNN) to model skeleton data. Third, for inertial sensor data, we propose data augmentation based on jittering with white Gaussian noise along with deep a 1D-CNN network for action classification. The outputs of all these three heterogeneous networks (1D-CNN, 2D-CNN and BiGRU) are combined by a variety of model fusion approach based on score and feature fusion. Finally, in order to illustrate the efficacy of the proposed framework, we test our model on a publicly available UTD-MHAD dataset, and achieved an overall accuracy of 97.91%, which is about 4% higher than using each modality individually. We hope that the discussions and conclusions from this work will provide a deeper insight to the researchers in the related fields, and provide avenues for further studies for different multi-sensor based fusion architectures.

引用

页码：189 / 208

页数：19

共 50 条

[41] Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images
Huang, Kengda
Zhou, Wujie
Fang, Meixin
[J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
[42] Dual-stream cross-modality fusion transformer for RGB-D action recognition
Liu, Zhen
Cheng, Jun
Liu, Libo
Ren, Ziliang
Zhang, Qieshi
Song, Chengqun
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 255
[43] Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors
Chen, Chen
Jafari, Roozbeh
Kehtarnavaz, Nasser
[J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2015, 45 (01) : 51 - 61
[44] RGB-D BASED MULTIMODAL CONVOLUTIONAL NEURAL NETWORKS FOR SPACECRAFT RECOGNITION
AlDahoul, Nouar
Karim, Hezerul Abdul
Momo, Mhd Adel
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING CHALLENGES (ICIPC), 2021, : 1 - 5
[45] RGB-D OBJECT RECOGNITION WITH MULTIMODAL DEEP CONVOLUTIONAL NEURAL NETWORKS
Rahman, Mohammad Muntasir
Tan, Yanhao
Xue, Jian
Lu, Ke
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 991 - 996
[46] RGB-D Action Recognition: Recent Advances and Future Perspectives
Hu, Jian-Fang
Wang, Xiong-Hui
Zheng, Wei-Shi
Lai, Jian-Huang
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 2019, 45 (05): : 829 - 840
[47] Latent Tensor Transfer Learning for RGB-D Action Recognition
Jia, Chengcheng
Kong, Yu
Ding, Zhengming
Fu, Yun
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 87 - 96
[48] Viewpoint Invariant Action Recognition Using RGB-D Videos
Liu, Jian
Akhtar, Naveed
Mian, Ajmal
[J]. IEEE ACCESS, 2018, 6 : 70061 - 70071
[49] Discriminative Relational Representation Learning for RGB-D Action Recognition
Kong, Yu
Fu, Yun
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2856 - 2865
[50] Human Action Recognition Using RGB-D Sensor and Deep Convolutional Neural Networks
Imran, Javed
Kumar, Praveen
[J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 144 - 148

← 1 2 3 4 5 →