Fusion of Skeleton and RGB Features for RGB-D Human Action Recognition

被引:24
|
作者
Weiyao, Xu [1 ,2 ,3 ]
Muqing, Wu [1 ,2 ]
Min, Zhao [1 ,2 ]
Ting, Xia [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Lab, Adv Informat Network, Beijing 100876, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing Key Lab, Network Syst Architecture & Convergence, Beijing 100876, Peoples R China
[3] Zaozhuang Univ, Coll Optoelect Engn, Zaozhuang 277160, Peoples R China
关键词
Skeleton; Videos; Feature extraction; Convolution; Fuses; Sensors; Streaming media; RGB-D human action recognition; feature fusion; microsoft kinect; attention network;
D O I
10.1109/JSEN.2021.3089705
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The output of Microsoft Kinect is a multimodal signal, which provides RGB videos, depth sequences and skeleton information at the same time, opening up a new opportunity for the research of human action recognition. However, for different single modalities of the signals, how to exploit and fuse useful features of these various sources remains a very challenging problem. Most of the methods based on RGB-D action recognition simply fuse the multimodal features, ignoring the potential semantic relationship between different models. In this paper, we propose a multi-modal action recognition model based on Bilinear Pooling and Attention Network (BPAN), which could effectively fuse multi-modal for RGB-D action recognition. Firstly, we adopt the efficient data preprocessing methods for RGB and skeleton data. Then, we propose a multimodal fusion network combining RGB video and skeleton sequences. The proposed BPAN module could effectively compress the features of RGB and skeleton, and project them into latent subspace to get the fusion features. In the end, a fully connected three-layer perceptron is adopted to obtain the final classification decision. Experimental results on three public datasets demonstrate that our proposed method leads to a more favorable performance compared with the state-of-the-art methods.
引用
收藏
页码:19157 / 19164
页数:8
相关论文
共 50 条
  • [1] Human Action Recognition Using RGB-D Image Features
    Tang, Chao
    Wang, Wenjian
    Zhang, Chen
    Peng, Hua
    Li, Wei
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (10): : 901 - 908
  • [2] Infrared and 3D Skeleton Feature Fusion for RGB-D Action Recognition
    De Boissiere, Alban Main
    Noumeir, Rita
    [J]. IEEE ACCESS, 2020, 8 : 168297 - 168308
  • [3] Action Recognition Based on Adaptive Fusion of RGB and Skeleton Features
    Guo Fuzheng
    Kong Jun
    Jiang Min
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (20)
  • [4] Child Action Recognition in RGB and RGB-D Data
    Turarova, Aizada
    Zhanatkyzy, Aida
    Telisheva, Zhansaule
    Sabyrov, Arman
    Sandygulova, Anara
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 491 - 492
  • [5] Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices
    Andre Chaaraoui, Alexandros
    Ramon Padilla-Lopez, Jose
    Florez-Revuelta, Francisco
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 91 - 97
  • [6] Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition
    Javed Imran
    Balasubramanian Raman
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 189 - 208
  • [7] Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition
    Imran, Javed
    Raman, Balasubramanian
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 189 - 208
  • [8] MULTIMODAL FEATURE FUSION MODEL FOR RGB-D ACTION RECOGNITION
    Xu Weiyao
    Wu Muqing
    Zhao Min
    Xia Ting
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [9] ReadingAct RGB-D action dataset and human action recognition from local features
    Chen, Lulu
    Wei, Hong
    Ferryman, James
    [J]. PATTERN RECOGNITION LETTERS, 2014, 50 : 159 - 169
  • [10] Viewpoint Invariant RGB-D Human Action Recognition
    Liu, Jian
    Akhtar, Naveed
    Mian, Ajmal
    [J]. 2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 261 - 268