Learning Virtual View Selection for 3D Scene Semantic Segmentation

被引:0
|
作者
Mu, Tai-Jiang [1 ,2 ]
Shen, Ming-Yuan [1 ,2 ]
Lai, Yu-Kun [3 ]
Hu, Shi-Min [1 ,2 ]
机构
[1] Tsinghua Univ, Minist Educ, Key Lab Pervas Comp, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 4AG, Wales
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Semantic segmentation; Solid modeling; Semantics; Geometry; Task analysis; Deep reinforcement learning; Virtual view selection; 2D-3D joint learning; deep reinforcement learning; 3D semantic segmentation; RECONSTRUCTION; NETWORK;
D O I
10.1109/TIP.2024.3421952
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images "as they are", i.e., only real captured 2D images are used. However, such captured 2D images may be redundant, with abundant occlusion and/or limited field of view (FoV), leading to poor performance for the current methods involving 2D inputs. In this paper, we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We then feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding. Specifically, we generate virtual 2D views based on an information score map learned from the current 3D scene semantic segmentation results. To achieve this, we formalize the learning of the information score map as a deep reinforcement learning process, which rewards good predictions using a deep neural network. To obtain a compact set of virtual 2D views that jointly cover informative surfaces of the 3D scene as much as possible, we further propose an efficient greedy virtual view coverage strategy in the normal-sensitive 6D space, including 3-dimensional point coordinates and 3-dimensional normal. We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on two real-world 3D scene datasets, i.e., ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for joint 2D and 3D scene semantic segmentation.
引用
收藏
页码:4159 / 4172
页数:14
相关论文
共 50 条
  • [1] 3DMV: Joint 3D-Multi-view Prediction for 3D Semantic Scene Segmentation
    Dai, Angela
    Niessner, Matthias
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 458 - 474
  • [2] 3D Semantic Scene Segmentation with Multi-View RGB-D Images in Indoor Environments
    Bae H.-L.
    Kim I.
    [J]. Journal of Institute of Control, Robotics and Systems, 2023, 29 (03) : 235 - 244
  • [3] Semantic Instance Segmentation in a 3D Traffic Scene Reconstruction task
    Hadi, Shiqah
    Phon-Amnuaisuk, Somnuk
    Tan, Soon-Jiann
    [J]. 2020 59TH ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS OF JAPAN (SICE), 2020, : 186 - 191
  • [4] Semantic segmentation of 3D textured meshes for urban scene analysis
    Rouhani, Mohammad
    Lafarge, Florent
    Alliez, Pierre
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2017, 123 : 124 - 139
  • [5] Semantic Segmentation of 3D Scene based on Global Feature Fusion
    Wang, Dan
    Liu, Shuaijun
    Xu, Nansheng
    Lin, Xiaobo
    Wang, Zijiang
    [J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 286 - 290
  • [6] Efficient 3D Scene Semantic Segmentation via Active Learning on Rendered 2D Images
    Rong, Mengqi
    Cui, Hainan
    Shen, Shuhan
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3521 - 3535
  • [7] Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
    Wald, Johanna
    Dhamo, Helisa
    Navab, Nassir
    Tombari, Federico
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3960 - 3969
  • [8] Learning View Selection for 3D Scenes
    Sun, Yifan
    Huang, Qixing
    Hsiao, Dun-Yu
    Guan, Li
    Hua, Gang
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14459 - 14468
  • [9] Learning 3D Semantic Scene Graphs with Instance Embeddings
    Johanna Wald
    Nassir Navab
    Federico Tombari
    [J]. International Journal of Computer Vision, 2022, 130 : 630 - 651
  • [10] Learning 3D Semantic Scene Graphs with Instance Embeddings
    Wald, Johanna
    Navab, Nassir
    Tombari, Federico
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (03) : 630 - 651