Multi-Modal RGB-D Scene Recognition Across Domains

被引:2
|
作者
Ferreri, Andrea [1 ]
Bucci, Silvia [1 ,2 ]
Tommasi, Tatiana [1 ,2 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Italian Inst Technol, Genoa, Italy
关键词
D O I
10.1109/ICCVW54120.2021.00249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene recognition is one of the basic problems in computer vision research with extensive applications in robotics. When available, depth images provide helpful geometric cues that complement the RGB texture information and help to identify discriminative scene image features. Depth sensing technology developed fast in the last years and a great variety of 3D cameras have been introduced, each with different acquisition properties. However, those properties are often neglected when targeting big data collections, so multi-modal images are gathered disregarding their original nature. In this work, we put under the spotlight the existence of a possibly severe domain shift issue within multi-modality scene recognition datasets. As a consequence, a scene classification model trained on one camera may not generalize on data from a different camera, only providing a low recognition performance. Starting from the well-known SUN RGB-D dataset, we designed an experimental testbed to study this problem and we use it to benchmark the performance of existing methods. Finally, we introduce a novel adaptive scene recognition approach that leverages self-supervised translation between modalities. Indeed, learning to go from RGB to depth and vice-versa is an unsupervised procedure that can be trained jointly on data of multiple cameras and may help to bridge the gap among the extracted feature distributions. Our experimental results confirm the effectiveness of the proposed approach.
引用
收藏
页码:2199 / 2208
页数:10
相关论文
共 50 条
  • [1] RGB-D Scene Classification via Multi-modal Feature Learning
    Cai, Ziyun
    Shao, Ling
    COGNITIVE COMPUTATION, 2019, 11 (06) : 825 - 840
  • [2] RGB-D Scene Classification via Multi-modal Feature Learning
    Ziyun Cai
    Ling Shao
    Cognitive Computation, 2019, 11 : 825 - 840
  • [3] Multi-modal Unsupervised Feature Learning for RGB-D Scene Labeling
    Wang, Anran
    Lu, Jiwen
    Wang, Gang
    Cai, Jianfei
    Cham, Tat-Jen
    COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 : 453 - 467
  • [4] RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
    Xiong, Zhitong
    Yuan, Yuan
    Wang, Qi
    IEEE ACCESS, 2019, 7 : 106739 - 106747
  • [5] Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition
    Zaki, Hasan F. M.
    Shafait, Faisal
    Mian, Ajmal
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2017, 92 : 41 - 52
  • [6] MULTI-MODAL FEATURE FUSION FOR ACTION RECOGNITION IN RGB-D SEQUENCES
    Shahroudy, Amir
    Wang, Gang
    Ng, Tian-Tsong
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 73 - 76
  • [7] A Multi-Modal RGB-D Object Recognizer
    Faeulhammer, Thomas
    Zillich, Michael
    Prankl, Johann
    Vincze, Markus
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 733 - 738
  • [8] MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification
    Li, Yabei
    Zhang, Zhang
    Cheng, Yanhua
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2019, 90 : 436 - 449
  • [9] DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation
    Yuan, Jianzhong
    Zhou, Wujie
    Luo, Ting
    IEEE ACCESS, 2019, 7 : 169350 - 169358
  • [10] RGB-D based multi-modal deep learning for spacecraft and debris recognition
    AlDahoul, Nouar
    Karim, Hezerul Abdul
    Momo, Mhd Adel
    SCIENTIFIC REPORTS, 2022, 12 (01)