RGB-D Semantic Segmentation and Label-Oriented Voxelgrid Fusion for Accurate 3D Semantic Mapping

被引:32
|
作者
Shi, Wenjun [1 ]
Xu, Jingwei [2 ]
Zhu, Dongchen [1 ]
Zhang, Guanghui [1 ,3 ]
Wang, Xianshun [1 ,3 ]
Li, Jiamao [1 ,3 ]
Zhang, Xiaolin [1 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Bion Vis Syst Lab, State Key Lab Transducer Technol, Shanghai 200050, Peoples R China
[2] SenseTime Res, Shanghai 200233, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
[4] Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Three-dimensional displays; Two dimensional displays; Streaming media; Feature extraction; Image segmentation; Labeling; Semantic mapping; semantic fusion; discriminatory mask;
D O I
10.1109/TCSVT.2021.3056726
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The 3D semantic map plays an increasingly important role in a wide variety of applications, especially for many kinds of task-driven robots. In this paper, we present a semantic mapping methodology for 3D semantic map obtaining from RGB-D scans. In contrast to existing methods that use 3D annotated information as supervisory, we focus on accurate 2D frame labeling and combine labels in 3D space using semantic fusion mechanism. For scene parsing, a two-stream network with a novel discriminatory mask loss is proposed to explore sufficient extraction and fusion of RGB and depth information achieving steadily semantic segmentation. The discriminatory mask guides the cross-entropy loss function and interprets the influence of different pixels on back-propagation, which reduces the harmful effects of the depth noise or the fallible annotation at the edges of objects. After the correspondences between frames are provided, these semantic frames are fused in unified 3D coordinates using the novel label-oriented voxelgrid filter. It can ensure the intra-frame spatial continuity and the inter-frame spatiotemporal consistency through introducing the label-oriented statistical principle into labeled point clouds. In order to avoid the unfavorable interference between uncorrelated frames, we further propose an adaptive grouping algorithm by applying the view frustum filter to group frames with sufficient overlap as a segment. To this end, we demonstrate the effectiveness of the proposed method on the 2D/3D semantic label benchmark of ScanNetv2 and Cityscapes datasets.
引用
收藏
页码:183 / 197
页数:15
相关论文
共 50 条
  • [1] Salient Semantic Segmentation Based on RGB-D Camera for Robot Semantic Mapping
    Hu, Lihe
    Zhang, Yi
    Wang, Yang
    Yang, Huan
    Tan, Shuyi
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [2] Accurate semantic segmentation of RGB-D images for indoor navigation
    Sharan, Sudeep
    Nauth, Peter
    Dominguez-Jimenez, Juan-Jose
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [3] RGB-D SEMANTIC SEGMENTATION: A REVIEW
    Hu, Yaosi
    Chen, Zhenzhong
    Lin, Weiyao
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
  • [4] A Fusion Network for Semantic Segmentation Using RGB-D Data
    Yuan, Jiahui
    Zhang, Kun
    Xia, Yifan
    Qi, Lin
    Dong, Junyu
    [J]. NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [5] 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
    Hou, Ji
    Dai, Angela
    Niessner, Matthias
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4416 - 4425
  • [6] Semantic Segmentation Networks of 3D Point Clouds for RGB-D Indoor Scenes
    Wang, Ya
    Zell, Andreas
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [7] Multi-scale fusion for RGB-D indoor semantic segmentation
    Jiang, Shiyi
    Xu, Yang
    Li, Danyang
    Fan, Runze
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [8] Attention-based fusion network for RGB-D semantic segmentation
    Zhong, Li
    Guo, Chi
    Zhan, Jiao
    Deng, JingYi
    [J]. NEUROCOMPUTING, 2024, 608
  • [9] Multi-scale fusion for RGB-D indoor semantic segmentation
    Shiyi Jiang
    Yang Xu
    Danyang Li
    Runze Fan
    [J]. Scientific Reports, 12 (1)
  • [10] Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
    Ge, Bin
    Zhu, Xu
    Tang, Zihan
    Xia, Chenxing
    Lu, Yiming
    Chen, Zhuang
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (05)