RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

被引:15
|
作者
Xiong, Zhitong [1 ,2 ]
Yuan, Yuan [1 ]
Wang, Qi [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D; scene recognition; global and local features; multi-modal feature learning;
D O I
10.1109/ACCESS.2019.2932080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.
引用
收藏
页码:106739 / 106747
页数:9
相关论文
共 50 条
  • [31] Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images
    Yu Zheng
    Xinbo Gao
    Multimedia Tools and Applications, 2017, 76 : 4427 - 4443
  • [32] Unsupervised Joint Feature Learning and Encoding for RGB-D Scene Labeling
    Wang, Anran
    Lu, Jiwen
    Cai, Jianfei
    Wang, Gang
    Cham, Tat-Jen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (11) : 4459 - 4473
  • [33] RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory
    Zeng, Hui
    Yang, Bin
    Wang, Xiuqing
    Liu, Jiwei
    Fu, Dongmei
    SENSORS, 2019, 19 (03)
  • [34] Collaborative multimodal feature learning for RGB-D action recognition
    Kong, Jun
    Liu, Tianshan
    Jiang, Min
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 59 : 537 - 549
  • [35] Discriminative Feature Learning for Efficient RGB-D Object Recognition
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 272 - 279
  • [36] Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition
    Zhu, Hongyuan
    Weibel, Jean-Baptiste
    Lu, Shijian
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2969 - 2976
  • [37] InstaIndoor and multi-modal deep learning for indoor scene recognition
    Glavan, Andreea
    Talavera, Estefania
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (09): : 6861 - 6877
  • [38] InstaIndoor and multi-modal deep learning for indoor scene recognition
    Andreea Glavan
    Estefanía Talavera
    Neural Computing and Applications, 2022, 34 : 6861 - 6877
  • [39] A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling
    Asif, Umar
    Bennamoun, Mohammed
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (09) : 2051 - 2065
  • [40] Hierarchical multi-modal fusion FCN with attention model for RGB-D tracking
    Jiang, Ming-xin
    Deng, Chao
    Shan, Jing-song
    Wang, Yuan-yuan
    Jia, Yin-jie
    Sun, Xing
    INFORMATION FUSION, 2019, 50 : 1 - 8