RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

被引:15
|
作者
Xiong, Zhitong [1 ,2 ]
Yuan, Yuan [1 ]
Wang, Qi [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D; scene recognition; global and local features; multi-modal feature learning;
D O I
10.1109/ACCESS.2019.2932080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.
引用
收藏
页码:106739 / 106747
页数:9
相关论文
共 50 条
  • [1] RGB-D Scene Classification via Multi-modal Feature Learning
    Cai, Ziyun
    Shao, Ling
    COGNITIVE COMPUTATION, 2019, 11 (06) : 825 - 840
  • [2] RGB-D Scene Classification via Multi-modal Feature Learning
    Ziyun Cai
    Ling Shao
    Cognitive Computation, 2019, 11 : 825 - 840
  • [3] Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling
    Wang, Anran
    Lu, Jiwen
    Wang, Gang
    Cai, Jianfei
    Cham, Tat-Jen
    COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 453 - 467
  • [4] Multi-Modal RGB-D Scene Recognition Across Domains
    Ferreri, Andrea
    Bucci, Silvia
    Tommasi, Tatiana
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208
  • [5] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
  • [6] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [7] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [8] Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition
    Zaki, Hasan F. M.
    Shafait, Faisal
    Mian, Ajmal
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 92 : 41 - 52
  • [9] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    AlDahoul, Nouar
    Karim, Hezerul Abdul
    Momo, Mhd Adel
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [10] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    Nouar AlDahoul
    Hezerul Abdul Karim
    Mhd Adel Momo
    Scientific Reports, 12