RGB-D Scene Classification via Multi-modal Feature Learning

被引:0
|
作者
Ziyun Cai
Ling Shao
机构
[1] School of Automation,
[2] Northwestern Polytechnical University,undefined
[3] College of Automation,undefined
[4] Nanjing University of Posts and Telecommunications,undefined
[5] Inception Institute of Artificial Intelligence,undefined
来源
Cognitive Computation | 2019年 / 11卷
关键词
Deep learning; Local fine-tuning; Convolutional neural networks; RGB-D scene classification;
D O I
暂无
中图分类号
学科分类号
摘要
Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.
引用
收藏
页码:825 / 840
页数:15
相关论文
共 50 条
  • [31] A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (09) : 2051 - 2065
  • [32] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Shan, Dexing
    Zhang, Yunzhou
    Liu, Xiaozheng
    Liu, Shitong
    Coleman, Sonya A.
    Kerr, Dermot
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (14): : 10297 - 10310
  • [33] MMPL-Net: multi-modal prototype learning for one-shot RGB-D segmentation
    Dexing Shan
    Yunzhou Zhang
    Xiaozheng Liu
    Shitong Liu
    Sonya A. Coleman
    Dermot Kerr
    Neural Computing and Applications, 2023, 35 : 10297 - 10310
  • [34] Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking
    Jiang, Ming-xin
    Deng, Chao
    Shan, Jing-song
    Wang, Yuan-yuan
    Jia, Yin-jie
    Sun, Xing
    INFORMATION FUSION, 2019, 50 : 1 - 8
  • [35] Eulerian Magnification of Multi-Modal RGB-D Video for Heart Rate Estimation
    Dosso, Yasmina Souley
    Bekele, Amente
    Green, James R.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2018, : 642 - 647
  • [36] Multi-modal deep learning for Fuji apple detection using RGB-D cameras and their radiometric capabilities
    Gene-Mola, Jordi
    Vilaplana, Veronica
    Rosell-Polo, Joan R.
    Morros, Josep-Ramon
    Ruiz-Hidalgo, Javier
    Gregorio, Eduard
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 162 : 689 - 698
  • [37] RGB-D Object Discovery via Multi-Scene Analysis
    Herbst, Evan
    Ren, Xiaofeng
    Fox, Dieter
    2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011,
  • [38] DF2Net: A Discriminative Feature Learning and Fusion Network for RGB-D Indoor Scene Classification
    Li, Yabei
    Zhang, Junge
    Cheng, Yanhua
    Huang, Kaiqi
    Tan, Tieniu
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7041 - 7048
  • [39] Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classification and Beyond
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
  • [40] Learning Multiviewpoint Context-Aware Representation for RGB-D Scene Classification
    Zheng, Yingbin
    Ye, Hao
    Wang, Li
    Pu, Jian
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (01) : 30 - 34