RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

被引:15
|
作者
Xiong, Zhitong [1 ,2 ]
Yuan, Yuan [1 ]
Wang, Qi [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D; scene recognition; global and local features; multi-modal feature learning;
D O I
10.1109/ACCESS.2019.2932080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.
引用
收藏
页码:106739 / 106747
页数:9
相关论文
共 50 条
  • [21] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [22] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
    Du, Dapeng
    Wang, Limin
    Li, Zhaoyang
    Wu, Gangshan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (08) : 2309 - 2327
  • [23] Multi-modal deep network for RGB-D segmentation of clothes
    Joukovsky, B.
    Hu, P.
    Munteanu, A.
    ELECTRONICS LETTERS, 2020, 56 (09) : 432 - 434
  • [24] Cross-Modal Pyramid Translation for RGB-D Scene Recognition
    Dapeng Du
    Limin Wang
    Zhaoyang Li
    Gangshan Wu
    International Journal of Computer Vision, 2021, 129 : 2309 - 2327
  • [25] RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision
    Liu Zhengyi
    Duan Quntao
    Shi Song
    Zhao Peng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (04) : 997 - 1004
  • [26] Multi-modal uniform deep learning for RGB-D person re-identification
    Ren, Liangliang
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    PATTERN RECOGNITION, 2017, 72 : 446 - 457
  • [27] RGB-D Face Recognition via Deep Complementary and Common Feature Learning
    Zhang, Hao
    Han, Hu
    Cui, Jiyun
    Shan, Shiguang
    Chen, Xilin
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 8 - 15
  • [28] Indoor Scene Recognition from RGB-D Images by Learning Scene Bases
    Wan, Shaohua
    Hu, Changbo
    Aggarwal, J. K.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3416 - 3421
  • [29] Multi-Modal Deep Learning for Weeds Detection in Wheat Field Based on RGB-D Images
    Xu, Ke
    Zhu, Yan
    Cao, Weixing
    Jiang, Xiaoping
    Jiang, Zhijian
    Li, Shuailong
    Ni, Jun
    FRONTIERS IN PLANT SCIENCE, 2021, 12
  • [30] Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images
    Zheng, Yu
    Gao, Xinbo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 4427 - 4443