A Multi-modal Graphical Model for Scene Analysis

被引:15
|
作者
Namin, Sarah Taghavi [1 ]
Najafi, Mohammad [1 ]
Salzmann, Mathieu [1 ]
Petersson, Lars [1 ]
机构
[1] Australian Natl Univ, NICTA, Canberra, ACT 0200, Australia
关键词
SEMANTIC SEGMENTATION;
D O I
10.1109/WACV.2015.139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a multi-modal graphical model to address the problems of semantic segmentation using 2D-3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. We evaluate our method on a publicly available dataset and beat the state-of-the-art. Additionally, to demonstrate the ability of our model to support multiple correspondences for objects in 3D and 2D domains, we introduce a new multi-modal dataset, which is composed of panoramic images and LIDAR data, and features a rich set of many-to-one correspondences.
引用
收藏
页码:1006 / 1013
页数:8
相关论文
共 50 条
  • [21] MIA-Net: Multi-Modal Interactive Attention Network for Multi-Modal Affective Analysis
    Li, Shuzhen
    Zhang, Tong
    Chen, Bianna
    Chen, C. L. Philip
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 2796 - 2809
  • [22] Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation
    Li, Bin
    Weng, Yixuan
    Ma, Ziyu
    Sun, Bin
    Li, Shutao
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 179 - 191
  • [23] Disambiguating Multi-Modal Scene Representations Using Perceptual Grouping Constraints
    Pugeault, Nicolas
    Worgotter, Florentin
    Kruger, Norbert
    [J]. PLOS ONE, 2010, 5 (06):
  • [24] Unsupervised scene detection and commentator building using multi-modal chains
    Gert-Jan Poulisse
    Yorgos Patsis
    Marie-Francine Moens
    [J]. Multimedia Tools and Applications, 2014, 70 : 159 - 175
  • [25] Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition
    Zhu, Hongyuan
    Weibel, Jean-Baptiste
    Lu, Shijian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2969 - 2976
  • [26] Multi-Modal RGB-D Scene Recognition Across Domains
    Ferreri, Andrea
    Bucci, Silvia
    Tommasi, Tatiana
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208
  • [27] Unsupervised scene detection and commentator building using multi-modal chains
    Poulisse, Gert-Jan
    Patsis, Yorgos
    Moens, Marie-Francine
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (01) : 159 - 175
  • [28] GRAPH-BASED MULTI-MODAL SCENE DETECTION FOR MOVIE AND TELEPLAY
    Xu, Su
    Feng, Bailan
    Ding, Peng
    Xu, Bo
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1413 - 1416
  • [29] A Novel Multi-Modal Network-Based Dynamic Scene Understanding
    Uddin, Md Azher
    Joolee, Joolekha Bibi
    Lee, Young-Koo
    Sohn, Kyung-Ah
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
  • [30] Incremental Dense Multi-modal 3D Scene Reconstruction
    Miksik, Ondrej
    Amar, Yousef
    Vineet, Vibhav
    Perez, Patrick
    Torr, Philip H. S.
    [J]. 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 908 - 915