RGB-D Semantic Segmentation and Label-Oriented Voxelgrid Fusion for Accurate 3D Semantic Mapping

被引：32

作者：

Shi, Wenjun ^{[1
]}

Xu, Jingwei ^{[2
]}

Zhu, Dongchen ^{[1
]}

Zhang, Guanghui ^{[1
,3
]}

Wang, Xianshun ^{[1
,3
]}

Li, Jiamao ^{[1
,3
]}

Zhang, Xiaolin ^{[1
,3
,4
]}

机构：

[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Bion Vis Syst Lab, State Key Lab Transducer Technol, Shanghai 200050, Peoples R China

[2] SenseTime Res, Shanghai 200233, Peoples R China

[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China

[4] Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Semantics; Three-dimensional displays; Two dimensional displays; Streaming media; Feature extraction; Image segmentation; Labeling; Semantic mapping; semantic fusion; discriminatory mask;

D O I：

10.1109/TCSVT.2021.3056726

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The 3D semantic map plays an increasingly important role in a wide variety of applications, especially for many kinds of task-driven robots. In this paper, we present a semantic mapping methodology for 3D semantic map obtaining from RGB-D scans. In contrast to existing methods that use 3D annotated information as supervisory, we focus on accurate 2D frame labeling and combine labels in 3D space using semantic fusion mechanism. For scene parsing, a two-stream network with a novel discriminatory mask loss is proposed to explore sufficient extraction and fusion of RGB and depth information achieving steadily semantic segmentation. The discriminatory mask guides the cross-entropy loss function and interprets the influence of different pixels on back-propagation, which reduces the harmful effects of the depth noise or the fallible annotation at the edges of objects. After the correspondences between frames are provided, these semantic frames are fused in unified 3D coordinates using the novel label-oriented voxelgrid filter. It can ensure the intra-frame spatial continuity and the inter-frame spatiotemporal consistency through introducing the label-oriented statistical principle into labeled point clouds. In order to avoid the unfavorable interference between uncorrelated frames, we further propose an adaptive grouping algorithm by applying the view frustum filter to group frames with sufficient overlap as a segment. To this end, we demonstrate the effectiveness of the proposed method on the 2D/3D semantic label benchmark of ScanNetv2 and Cityscapes datasets.

引用

页码：183 / 197

页数：15

共 50 条

[1] Salient Semantic Segmentation Based on RGB-D Camera for Robot Semantic Mapping
Hu, Lihe
Zhang, Yi
Wang, Yang
Yang, Huan
Tan, Shuyi
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):
[2] Accurate semantic segmentation of RGB-D images for indoor navigation
Sharan, Sudeep
Nauth, Peter
Dominguez-Jimenez, Juan-Jose
[J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
[3] RGB-D SEMANTIC SEGMENTATION: A REVIEW
Hu, Yaosi
Chen, Zhenzhong
Lin, Weiyao
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
[4] A Fusion Network for Semantic Segmentation Using RGB-D Data
Yuan, Jiahui
Zhang, Kun
Xia, Yifan
Qi, Lin
Dong, Junyu
[J]. NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
[5] 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
Hou, Ji
Dai, Angela
Niessner, Matthias
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4416 - 4425
[6] Semantic Segmentation Networks of 3D Point Clouds for RGB-D Indoor Scenes
Wang, Ya
Zell, Andreas
[J]. TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
[7] Multi-scale fusion for RGB-D indoor semantic segmentation
Jiang, Shiyi
Xu, Yang
Li, Danyang
Fan, Runze
[J]. SCIENTIFIC REPORTS, 2022, 12 (01):
[8] Attention-based fusion network for RGB-D semantic segmentation
Zhong, Li
Guo, Chi
Zhan, Jiao
Deng, JingYi
[J]. NEUROCOMPUTING, 2024, 608
[9] Multi-scale fusion for RGB-D indoor semantic segmentation
Shiyi Jiang
Yang Xu
Danyang Li
Runze Fan
[J]. Scientific Reports, 12 (1)
[10] Triple fusion and feature pyramid decoder for RGB-D semantic segmentation
Ge, Bin
Zhu, Xu
Tang, Zihan
Xia, Chenxing
Lu, Yiming
Chen, Zhuang
[J]. MULTIMEDIA SYSTEMS, 2024, 30 (05)

← 1 2 3 4 5 →