Instance-Aware Monocular 3D Semantic Scene Completion

被引：0

作者：

Xiao, Haihong ^{[1
]}

Xu, Hongbin ^{[1
]}

Kang, Wenxiong ^{[1
]}

Li, Yuqiong ^{[2
]}

机构：

[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 511442, Peoples R China

[2] Chinese Acad Sci, Inst Mech, Key Lab Mech Fluid Solid Coupling Syst, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 07期

基金：

中国国家自然科学基金;

关键词：

3D scene understanding; semantic scene completion; 3D vision;

D O I：

10.1109/TITS.2023.3344806

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

We study outdoor 3D scene understanding, a challenging task demanding the intelligent system to infer both geometry and semantics from a single-view image - a critical skill for autonomous vehicles to navigate in the real 3D world. Towards this end, we present an instance-aware monocular semantic scene completion framework. To the best of our knowledge, this is the first endeavor specifically targeting the challenge of instance perception in the camera-based semantic scene completion task. Our method consists of two stages. In stage I, we design a region-based VQ-VAE network, providing an effective solution for 3D occupancy prediction. In stage II, we first introduce an instance-aware attention module, explicitly incorporating instance-level cues captured from mask images to enhance the instance features in RGB images. Then we leverage the deformable cross-attention to aggregate image features corresponding to each voxel query and utilize the deformable self-attention to refine query proposals. We combine these key ingredients and evaluate our method on two challenging datasets, namely SemanticKITTI and SSCBench-KITTI-360. The results unequivocally demonstrate the superiority of our proposed method over the state-of-the-art VoxFormer-S. Specifically, our method surpasses VoxFormer-S by 0.22 IoU and 0.72 mIoU on the validation set and achieves an impressive improvement of 3.04 IoU and 1.06 mIoU on the SSCBench-KITTI-360 validation set. Meanwhile, our approach ensures accurate perception of critical instances, thereby exhibiting its exceptional performance and potential for practical deployment.

引用

页码：6543 / 6554

页数：12

共 50 条

[1] MonoScene: Monocular 3D Semantic Scene Completion
Anh-Quan Cao
de Charette, Raoul
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3981 - 3991
[2] Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery
Grinvald, Margarita
Furrer, Fadri
Novkovic, Tonci
Chung, Jen Jen
Cadena, Cesar
Siegwart, Roland
Nieto, Juan
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (03) : 3037 - 3044
[3] INSTANCE-AWARE SIMPLIFICATION OF 3D POLYGONAL MESHES
Azim, Tahir
Cheslack-Postava, Ewen
Levis, Philip
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
[4] Instance-Aware Scene Layout Forecasting
Qiao, Xiaotian
Zheng, Quanlong
Cao, Ying
Lau, Rynson W. H.
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (02) : 504 - 516
[5] Instance-Aware Scene Layout Forecasting
Xiaotian Qiao
Quanlong Zheng
Ying Cao
Rynson W. H. Lau
[J]. International Journal of Computer Vision, 2022, 130 : 504 - 516
[6] MRFTrans: Multimodal Representation Fusion Transformer for monocular 3D semantic scene completion
Xu, Rongtao
Zhang, Jiguang
Sun, Jiaxi
Wang, Changwei
Wu, Yifan
Xu, Shibiao
Meng, Weiliang
Zhang, Xiaopeng
[J]. INFORMATION FUSION, 2024, 111
[7] Semantic Point Completion Network for 3D Semantic Scene Completion
Zhong, Min
Zeng, Gang
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2824 - 2831
[8] 3D Semantic Scene Completion: A Survey
Luis Roldão
Raoul de Charette
Anne Verroust-Blondet
[J]. International Journal of Computer Vision, 2022, 130 : 1978 - 2005
[9] 3D Semantic Scene Completion: A Survey
Roldao, Luis
de Charette, Raoul
Verroust-Blondet, Anne
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (08) : 1978 - 2005
[10] Correction to: Instance-Aware Scene Layout Forecasting
Xiaotian Qiao
Quanlong Zheng
Ying Cao
Rynson W. H. Lau
[J]. International Journal of Computer Vision, 2022, 130 (3) : 883 - 883

← 1 2 3 4 5 →