Learning Virtual View Selection for 3D Scene Semantic Segmentation

被引：0

作者：

Mu, Tai-Jiang ^{[1
,2
]}

Shen, Ming-Yuan ^{[1
,2
]}

Lai, Yu-Kun ^{[3
]}

Hu, Shi-Min ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 4AG, Wales

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Semantic segmentation; Solid modeling; Semantics; Geometry; Task analysis; Deep reinforcement learning; Virtual view selection; 2D-3D joint learning; deep reinforcement learning; 3D semantic segmentation; RECONSTRUCTION; NETWORK;

D O I：

10.1109/TIP.2024.3421952

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images "as they are", i.e., only real captured 2D images are used. However, such captured 2D images may be redundant, with abundant occlusion and/or limited field of view (FoV), leading to poor performance for the current methods involving 2D inputs. In this paper, we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We then feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding. Specifically, we generate virtual 2D views based on an information score map learned from the current 3D scene semantic segmentation results. To achieve this, we formalize the learning of the information score map as a deep reinforcement learning process, which rewards good predictions using a deep neural network. To obtain a compact set of virtual 2D views that jointly cover informative surfaces of the 3D scene as much as possible, we further propose an efficient greedy virtual view coverage strategy in the normal-sensitive 6D space, including 3-dimensional point coordinates and 3-dimensional normal. We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on two real-world 3D scene datasets, i.e., ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for joint 2D and 3D scene semantic segmentation.

引用

页码：4159 / 4172

页数：14

共 50 条

[1] 3DMV: Joint 3D-Multi-view Prediction for 3D Semantic Scene Segmentation
Dai, Angela
Niessner, Matthias
[J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 458 - 474
[2] 3D Semantic Scene Segmentation with Multi-View RGB-D Images in Indoor Environments
Bae H.-L.
Kim I.
[J]. Journal of Institute of Control, Robotics and Systems, 2023, 29 (03) : 235 - 244
[3] Semantic Instance Segmentation in a 3D Traffic Scene Reconstruction task
Hadi, Shiqah
Phon-Amnuaisuk, Somnuk
Tan, Soon-Jiann
[J]. 2020 59TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE), 2020, : 186 - 191
[4] Semantic segmentation of 3D textured meshes for urban scene analysis
Rouhani, Mohammad
Lafarge, Florent
Alliez, Pierre
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 123 : 124 - 139
[5] Semantic Segmentation of 3D Scene based on Global Feature Fusion
Wang, Dan
Liu, Shuaijun
Xu, Nansheng
Lin, Xiaobo
Wang, Zijiang
[J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 286 - 290
[6] Efficient 3D Scene Semantic Segmentation via Active Learning on Rendered 2D Images
Rong, Mengqi
Cui, Hainan
Shen, Shuhan
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3521 - 3535
[7] Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Wald, Johanna
Dhamo, Helisa
Navab, Nassir
Tombari, Federico
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3960 - 3969
[8] Learning View Selection for 3D Scenes
Sun, Yifan
Huang, Qixing
Hsiao, Dun-Yu
Guan, Li
Hua, Gang
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14459 - 14468
[9] Learning 3D Semantic Scene Graphs with Instance Embeddings
Johanna Wald
Nassir Navab
Federico Tombari
[J]. International Journal of Computer Vision, 2022, 130 : 630 - 651
[10] Learning 3D Semantic Scene Graphs with Instance Embeddings
Wald, Johanna
Navab, Nassir
Tombari, Federico
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (03) : 630 - 651

← 1 2 3 4 5 →