Collaborative multimodal feature learning for RGB-D action recognition

被引:6
|
作者
Kong, Jun [1 ]
Liu, Tianshan [1 ]
Jiang, Min [1 ]
机构
[1] Jiangnan Univ, Jiangsu Prov Engn Lab Pattern Recognit & Computat, Wuxi 214122, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
RGB-D action recognition; Multimodal data; Max-margin learning framework; Supervised matrix factorization; FUSION; DEPTH;
D O I
10.1016/j.jvcir.2019.02.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of cost-effective depth sensors opens up a new dimension for RGB-D based human action recognition. In this paper, we propose a collaborative multimodal feature learning (CMFL) model for human action recognition from RGB-D sequences. Specifically, we propose a robust spatio-temporal pyramid feature (RSTPF) to capture dynamic local patterns around each human joint. The proposed CMFL model fuses multimodal data (skeleton, depth and RGB), and learns action classifiers using the fused features. The original low-level feature matrices are factorized to learn shared features and modality-specific features under a supervised fashion. The shared features describe the common structures among the three modalities while the modality-specific features capture intrinsic information of each modality. We formulate shared-specific features mining and action classifiers learning in a unified max-margin framework, and solve the formulation using an iterative optimization algorithm. Experimental results on four action datasets demonstrate the efficacy of the proposed method. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:537 / 549
页数:13
相关论文
共 50 条
  • [1] MULTIMODAL FEATURE FUSION MODEL FOR RGB-D ACTION RECOGNITION
    Xu Weiyao
    Wu Muqing
    Zhao Min
    Xia Ting
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [2] Temporal cues enhanced multimodal learning for action recognition in RGB-D videos
    Liu, Dan
    Meng, Fanrong
    Xia, Qing
    Ma, Zhiyuan
    Mi, Jinpeng
    Gan, Yan
    Ye, Mao
    Zhang, Jianwei
    [J]. NEUROCOMPUTING, 2024, 594
  • [3] RGB-D Action Recognition Using Multimodal Correlative Representation Learning Model
    Liu, Tianshan
    Kong, Jun
    Jiang, Min
    [J]. IEEE SENSORS JOURNAL, 2019, 19 (05) : 1862 - 1872
  • [4] Multimodal Deep Learning for Robust RGB-D Object Recognition
    Eitel, Andreas
    Springenberg, Jost Tobias
    Spinello, Luciano
    Riedmiller, Martin
    Burgard, Wolfram
    [J]. 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 681 - 687
  • [5] Deep Bilinear Learning for RGB-D Action Recognition
    Hu, Jian-Fang
    Zheng, Wei-Shi
    Pan, Jiahui
    Lai, Jianhuang
    Zhang, Jianguo
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 346 - 362
  • [6] Joint Deep Learning for RGB-D Action Recognition
    Qin, Xiaolei
    Ge, Yongxin
    Zhan, Liuwei
    Li, Guangrui
    Huang, Sheng
    Wang, Hongxing
    Chen, Feiyu
    Wang, Hongxing
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [7] Discriminative Feature Learning for Efficient RGB-D Object Recognition
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous
    [J]. 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 272 - 279
  • [8] Latent Tensor Transfer Learning for RGB-D Action Recognition
    Jia, Chengcheng
    Kong, Yu
    Ding, Zhengming
    Fu, Yun
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 87 - 96
  • [9] Discriminative Relational Representation Learning for RGB-D Action Recognition
    Kong, Yu
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2856 - 2865
  • [10] Child Action Recognition in RGB and RGB-D Data
    Turarova, Aizada
    Zhanatkyzy, Aida
    Telisheva, Zhansaule
    Sabyrov, Arman
    Sandygulova, Anara
    [J]. HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 491 - 492