RGB-D Scene Classification via Multi-modal Feature Learning

被引:0
|
作者
Ziyun Cai
Ling Shao
机构
[1] School of Automation,
[2] Northwestern Polytechnical University,undefined
[3] College of Automation,undefined
[4] Nanjing University of Posts and Telecommunications,undefined
[5] Inception Institute of Artificial Intelligence,undefined
来源
Cognitive Computation | 2019年 / 11卷
关键词
Deep learning; Local fine-tuning; Convolutional neural networks; RGB-D scene classification;
D O I
暂无
中图分类号
学科分类号
摘要
Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.
引用
收藏
页码:825 / 840
页数:15
相关论文
共 50 条
  • [1] RGB-D Scene Classification via Multi-modal Feature Learning
    Cai, Ziyun
    Shao, Ling
    COGNITIVE COMPUTATION, 2019, 11 (06) : 825 - 840
  • [2] Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling
    Wang, Anran
    Lu, Jiwen
    Wang, Gang
    Cai, Jianfei
    Cham, Tat-Jen
    COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 453 - 467
  • [3] RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
    Xiong, Zhitong
    Yuan, Yuan
    Wang, Qi
    IEEE ACCESS, 2019, 7 : 106739 - 106747
  • [4] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [5] Multi-Modal RGB-D Scene Recognition Across Domains
    Ferreri, Andrea
    Bucci, Silvia
    Tommasi, Tatiana
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208
  • [6] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
    Li, Yabei
    Zhang, Zhang
    Cheng, Yanhua
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2019, 90 : 436 - 449
  • [7] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
  • [8] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [9] A Multi-Modal RGB-D Object Recognizer
    Faeulhammer, Thomas
    Zillich, Michael
    Prankl, Johann
    Vincze, Markus
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 733 - 738
  • [10] Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition
    Zaki, Hasan F. M.
    Shafait, Faisal
    Mian, Ajmal
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 92 : 41 - 52