MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion

被引:0
|
作者
Xu, Rongtao [1 ,3 ]
Zhang, Jiguang
Sun, Jiaxi
Wang, Changwei [1 ,3 ]
Wu, Yifan [4 ]
Xu, Shibiao [2 ]
Meng, Weiliang [1 ,3 ]
Zhang, Xiaopeng [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Semantic scene completion; Transformer; Multimodal representation fusion; NETWORK; SENSOR;
D O I
10.1016/j.inffus.2024.102493
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The complete understanding of 3D scenes is crucial in robotic visual perception, impacting tasks such as motion planning and map localization. However, due to the limited field of view and scene occlusion constraints of sensors, inferring complete scene geometry and semantic information from restricted observations is challenging. In this work, we propose a novel Multimodal Representation Fusion Transformer framework (MRFTrans) that robustly fuses semantic, geometric occupancy, and depth representations for monocularimage -based scene completion. MRFTrans centers on an affinity representation fusion transformer, integrating geometric occupancy and semantic relationships within a transformer architecture. This integration enables the modeling of long-range dependencies within scenes for inferring missing information. Additionally, we present a depth representation fusion method, efficiently extracting reliable depth knowledge from biased monocular estimates. Extensive experiments demonstrate MRFTrans's superiority, setting a new benchmark on SemanticKITTI and NYUv2 datasets. It significantly enhances completeness and accuracy, particularly in large structures, movable objects, and scene components with major occlusions. The results underscore the benefits of the affinity -aware transformer and robust depth fusion in monocular -image -based completion.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] MonoScene: Monocular 3D Semantic Scene Completion
    Anh-Quan Cao
    de Charette, Raoul
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3981 - 3991
  • [2] Instance-Aware Monocular 3D Semantic Scene Completion
    Xiao, Haihong
    Xu, Hongbin
    Kang, Wenxiong
    Li, Yuqiong
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 6543 - 6554
  • [3] Semantic Scene Completion With 2D and 3D Feature Fusion
    Park, Sang-Min
    Ha, Jong-Eun
    [J]. IEEE Access, 2024, 12 : 141594 - 141603
  • [4] Semantic Point Completion Network for 3D Semantic Scene Completion
    Zhong, Min
    Zeng, Gang
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2824 - 2831
  • [5] 3D Semantic Scene Completion: A Survey
    Luis Roldão
    Raoul de Charette
    Anne Verroust-Blondet
    [J]. International Journal of Computer Vision, 2022, 130 : 1978 - 2005
  • [6] SEMANTIC SCENE COMPLETION WITH POINT CLOUD REPRESENTATION AND TRANSFORMER-BASED FEATURE FUSION
    Fu, Ruochong
    Wu, Hang
    Hao, Mengxiang
    Miao, Yubin
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3369 - 3373
  • [7] Semantic Scene Completion with Point Cloud Representation and Transformer-based feature fusion
    Fu, Ruochong
    Wu, Hang
    Hao, Mengxiang
    Miao, Yubin
    [J]. Proceedings - International Conference on Image Processing, ICIP, 2023, : 3369 - 3373
  • [8] 3D Semantic Scene Completion: A Survey
    Roldao, Luis
    de Charette, Raoul
    Verroust-Blondet, Anne
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (08) : 1978 - 2005
  • [9] NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
    Yao, Jiawei
    Li, Chuming
    Sun, Keqiang
    Cai, Yingjie
    Li, Hao
    Ouyang, Wanli
    Li, Hongsheng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9421 - 9431
  • [10] Two Stream 3D Semantic Scene Completion
    Garbade, Martin
    Chen, Yueh-Tung
    Sawatzky, Johann
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 416 - 425