RGB-D Scene Classification via Multi-modal Feature Learning

被引:0
|
作者
Ziyun Cai
Ling Shao
机构
[1] School of Automation,
[2] Northwestern Polytechnical University,undefined
[3] College of Automation,undefined
[4] Nanjing University of Posts and Telecommunications,undefined
[5] Inception Institute of Artificial Intelligence,undefined
来源
Cognitive Computation | 2019年 / 11卷
关键词
Deep learning; Local fine-tuning; Convolutional neural networks; RGB-D scene classification;
D O I
暂无
中图分类号
学科分类号
摘要
Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.
引用
收藏
页码:825 / 840
页数:15
相关论文
共 50 条
  • [41] An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection
    Wu, Yanxu
    Chen, Jianneng
    Wu, Shunkai
    Li, Hui
    He, Leiying
    Zhao, Runmao
    Wu, Chuanyu
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
  • [42] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [43] RGB-D Salient Object Detection via Joint Learning and Multi-feature Fusion
    Chen, Peng
    Li, BenHang
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 547 - 552
  • [44] computer catwalk: A multi-modal deep network for the segmentation of RGB-D images of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    Electronics Letters, 2020, 56 (09):
  • [45] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [46] Performance evaluation of deep feature learning for RGB-D image/video classification
    Shao, Ling
    Cai, Ziyun
    Liu, Li
    Lu, Ke
    INFORMATION SCIENCES, 2017, 385 : 266 - 283
  • [47] Learning Effective RGB-D Representations for Scene Recognition
    Song, Xinhang
    Jiang, Shuqiang
    Herranz, Luis
    Chen, Chengpeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 980 - 993
  • [48] Cauchy Estimator Discriminant Learning for RGB-D Sensor-based Scene Classification
    Tao, Dapeng
    Yang, Xipeng
    Liu, Weifeng
    Sun, Shuifa
    Guo, Yanan
    Yu, Ying
    Pang, Jianxin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 4471 - 4489
  • [49] Cauchy Estimator Discriminant Learning for RGB-D Sensor-based Scene Classification
    Dapeng Tao
    Xipeng Yang
    Weifeng Liu
    Shuifa Sun
    Yanan Guo
    Ying Yu
    Jianxin Pang
    Multimedia Tools and Applications, 2017, 76 : 4471 - 4489
  • [50] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
    Dapeng Du
    Limin Wang
    Zhaoyang Li
    Gangshan Wu
    International Journal of Computer Vision, 2021, 129 : 2309 - 2327