RGB-D Scene Classification via Multi-modal Feature Learning

被引：0

作者：

Ziyun Cai

Ling Shao

机构：

[1] School of Automation,

[2] Northwestern Polytechnical University,undefined

[3] College of Automation,undefined

[4] Nanjing University of Posts and Telecommunications,undefined

[5] Inception Institute of Artificial Intelligence,undefined

来源：

Cognitive Computation | 2019年 / 11卷

关键词：

Deep learning; Local fine-tuning; Convolutional neural networks; RGB-D scene classification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Most of the past deep learning methods which are proposed for RGB-D scene classification use global information and directly consider all pixels in the whole image for high-level tasks. Such methods cannot hold much information about local feature distributions, and simply concatenate RGB and depth features without exploring the correlation and complementarity between raw RGB and depth images. From the human vision perspective, we recognize the category of one unknown scene mainly relying on the object-level information in the scene which includes the appearance, texture, shape, and depth. The structural distribution of different objects is also taken into consideration. Based on this observation, constructing mid-level representations with discriminative object parts would generally be more attractive for scene analysis. In this paper, we propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework (LM-CNN) for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. The experimental results on two popular datasets, i.e., NYU v1 depth dataset and SUN RGB-D dataset, show that our method with local multi-modal CNNs outperforms state-of-the-art methods.

引用

页码：825 / 840

页数：15

共 50 条

[41] An improved YOLOv7 network using RGB-D multi-modal feature fusion for tea shoots detection
Wu, Yanxu
Chen, Jianneng
Wu, Shunkai
Li, Hui
He, Leiying
Zhao, Runmao
Wu, Chuanyu
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
[42] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
Sun, Chenwang
Zhang, Qing
Zhuang, Chenyu
Zhang, Mingqian
IMAGE AND VISION COMPUTING, 2024, 147
[43] RGB-D Salient Object Detection via Joint Learning and Multi-feature Fusion
Chen, Peng
Li, BenHang
2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 547 - 552
[44] computer catwalk: A multi-modal deep network for the segmentation of RGB-D images of clothes
Joukovsky, B.
Hu, P.
Munteanu, A.
Electronics Letters, 2020, 56 (09):
[45] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[46] Performance evaluation of deep feature learning for RGB-D image/video classification
Shao, Ling
Cai, Ziyun
Liu, Li
Lu, Ke
INFORMATION SCIENCES, 2017, 385 : 266 - 283
[47] Learning Effective RGB-D Representations for Scene Recognition
Song, Xinhang
Jiang, Shuqiang
Herranz, Luis
Chen, Chengpeng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 980 - 993
[48] Cauchy Estimator Discriminant Learning for RGB-D Sensor-based Scene Classification
Tao, Dapeng
Yang, Xipeng
Liu, Weifeng
Sun, Shuifa
Guo, Yanan
Yu, Ying
Pang, Jianxin
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 4471 - 4489
[49] Cauchy Estimator Discriminant Learning for RGB-D Sensor-based Scene Classification
Dapeng Tao
Xipeng Yang
Weifeng Liu
Shuifa Sun
Yanan Guo
Ying Yu
Jianxin Pang
Multimedia Tools and Applications, 2017, 76 : 4471 - 4489
[50] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
Dapeng Du
Limin Wang
Zhaoyang Li
Gangshan Wu
International Journal of Computer Vision, 2021, 129 : 2309 - 2327

← 1 2 3 4 5 →