Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos

被引:0
|
作者
Sun, Fengyuan [1 ]
Karaoglu, Sezer
Gevers, Theo
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1109/ICCVW60793.2023.00459
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of performing image semantic segmentation faces challenges in achieving consistent and robust results across a sequence of video frames. This problem becomes more prominent for indoor scenes where small camera movement can lead to drastic appearance changes, occlusions, and loss of global context information. To overcome these challenges, this paper proposes a novel approach that combines multi-view semantic fusion with spatial reasoning to produce view-invariant semantic features for temporally consistent semantic segmentation for indoor RGB-D videos. The experiments are conducted on the ScanNet dataset, showing that the proposed spatially aware multi-view fusion mechanism significantly improves the state-of-the-art image semantic segmentation methods Mask2Former and ViT-Adapter. In particular, the proposed pipeline offers improvements of 5%, 9.9%, and 14.4% in 2D mIoU, crossview consistency, and temporal consistency, respectively, when compared to Mask2Former. Similarly, when compared to ViT-Adapter, the proposed mechanism offers enhancements of 4.8%, 8.9%, and 10.9% in the same metrics.
引用
收藏
页码:4250 / 4259
页数:10
相关论文
共 50 条
  • [1] Multi-scale fusion for RGB-D indoor semantic segmentation
    Jiang, Shiyi
    Xu, Yang
    Li, Danyang
    Fan, Runze
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01):
  • [2] Multi-scale fusion for RGB-D indoor semantic segmentation
    Shiyi Jiang
    Yang Xu
    Danyang Li
    Runze Fan
    [J]. Scientific Reports, 12 (1)
  • [3] 3D Semantic Scene Segmentation with Multi-View RGB-D Images in Indoor Environments
    Bae, Hye-Lim
    Kim, Incheol
    [J]. Journal of Institute of Control, Robotics and Systems, 2023, 29 (03): : 235 - 244
  • [4] Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
    Ma, Lingni
    Stueckler, Joerg
    Kerl, Christian
    Cremers, Daniel
    [J]. 2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 598 - 605
  • [5] Attention-Aware and Semantic-Aware Network for RGB-D Indoor Semantic Segmentation
    Duan, Li-Juan
    Sun, Qi-Chao
    Qiao, Yuan-Hua
    Chen, Jun-Cheng
    Cui, Guo-Qin
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (02): : 275 - 291
  • [6] RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation
    Park, Seong-Jin
    Hong, Ki-Sang
    Lee, Seungyong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4990 - 4999
  • [7] A Fusion Network for Semantic Segmentation Using RGB-D Data
    Yuan, Jiahui
    Zhang, Kun
    Xia, Yifan
    Qi, Lin
    Dong, Junyu
    [J]. NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [8] RAFNet: RGB-D attention feature fusion network for indoor semantic segmentation
    Yan, Xingchao
    Hou, Sujuan
    Karim, Awudu
    Jia, Weikuan
    [J]. DISPLAYS, 2021, 70
  • [9] Semantic segmentation with Recurrent Neural Networks on RGB-D videos
    Gao, Chuan
    Wang, Weihong
    Chen, Mingxi
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 1203 - 1207
  • [10] ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation
    Cao, Jinming
    Leng, Hanchao
    Lischinski, Dani
    Cohen-Or, Danny
    Tu, Changhe
    Li, Yangyan
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7068 - 7077