RGB-D Scene Classification via Multi-modal Feature Learning

被引:0
|
作者
Ziyun Cai
Ling Shao
机构
[1] School of Automation,
[2] Northwestern Polytechnical University,undefined
[3] College of Automation,undefined
[4] Nanjing University of Posts and Telecommunications,undefined
[5] Inception Institute of Artificial Intelligence,undefined
来源
Cognitive Computation | 2019年 / 11卷
关键词
Deep learning; Local fine-tuning; Convolutional neural networks; RGB-D scene classification;
D O I
暂无
中图分类号
学科分类号
摘要
Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.
引用
收藏
页码:825 / 840
页数:15
相关论文
共 50 条
  • [21] Multi-modal uniform deep learning for RGB-D person re-identification
    Ren, Liangliang
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    PATTERN RECOGNITION, 2017, 72 : 446 - 457
  • [22] Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
    Wang, Anran
    Lu, Jiwen
    Cai, Jianfei
    Cham, Tat-Jen
    Wang, Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1887 - 1898
  • [23] Multi-modal deep learning networks for RGB-D pavement waste detection and recognition
    Li, Yangke
    Zhang, Xinman
    WASTE MANAGEMENT, 2024, 177 : 125 - 134
  • [24] RGB-D SCENE CLASSIFICATION VIA HETEROGENEOUS MODEL FUSION
    Liu, Xinda
    Wang, Xueming
    Jiang, Shuqiang
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 499 - 503
  • [25] Unsupervised Feature Learning for RGB-D Image Classification
    Jhuo, I-Hong
    Gao, Shenghua
    Zhuang, Liansheng
    Lee, D. T.
    Ma, Yi
    COMPUTER VISION - ACCV 2014, PT I, 2015, 9003 : 276 - 289
  • [26] Modality and Component Aware Feature Fusion for RGB-D Scene Classification
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5995 - 6004
  • [27] Cross-Level Multi-Modal Features Learning With Transformer for RGB-D Object Recognition
    Zhang, Ying
    Yin, Maoliang
    Wang, Heyong
    Hua, Changchun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7121 - 7130
  • [28] Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images
    Xu, Ke
    Zhu, Yan
    Cao, Weixing
    Jiang, Xiaoping
    Jiang, Zhijian
    Li, Shuailong
    Ni, Jun
    FRONTIERS IN PLANT SCIENCE, 2021, 12
  • [29] Unsupervised Joint Feature Learning and Encoding for RGB-D Scene Labeling
    Wang, Anran
    Lu, Jiwen
    Cai, Jianfei
    Wang, Gang
    Cham, Tat-Jen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) : 4459 - 4473
  • [30] MDSC-Net: Multi-Modal Discriminative Sparse Coding Driven RGB-D Classification Network
    Xu, Jingyi
    Deng, Xin
    Fu, Yibing
    Xu, Mai
    Li, Shengxi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 442 - 454