Self-Supervised Model Adaptation for Multimodal Semantic Segmentation

被引:0
|
作者
Abhinav Valada
Rohit Mohan
Wolfram Burgard
机构
[1] University of Freiburg,
[2] Toyota Research Institute,undefined
来源
关键词
Semantic segmentation; Multimodal fusion; Scene understanding; Model adaptation; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Learning to reliably perceive and understand the scene is an integral enabler for robots to operate in the real-world. This problem is inherently challenging due to the multitude of object types as well as appearance changes caused by varying illumination and weather conditions. Leveraging complementary modalities can enable learning of semantically richer representations that are resilient to such perturbations. Despite the tremendous progress in recent years, most multimodal convolutional neural network approaches directly concatenate feature maps from individual modality streams rendering the model incapable of focusing only on the relevant complementary information for fusion. To address this limitation, we propose a mutimodal semantic segmentation framework that dynamically adapts the fusion of modality-specific features while being sensitive to the object category, spatial location and scene context in a self-supervised manner. Specifically, we propose an architecture consisting of two modality-specific encoder streams that fuse intermediate encoder representations into a single decoder using our proposed self-supervised model adaptation fusion mechanism which optimally combines complementary features. As intermediate representations are not aligned across modalities, we introduce an attention scheme for better correlation. In addition, we propose a computationally efficient unimodal segmentation architecture termed AdapNet++ that incorporates a new encoder with multiscale residual units and an efficient atrous spatial pyramid pooling that has a larger effective receptive field with more than 10×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\,\times $$\end{document} fewer parameters, complemented with a strong decoder with a multi-resolution supervision scheme that recovers high-resolution details. Comprehensive empirical evaluations on Cityscapes, Synthia, SUN RGB-D, ScanNet and Freiburg Forest benchmarks demonstrate that both our unimodal and multimodal architectures achieve state-of-the-art performance while simultaneously being efficient in terms of parameters and inference time as well as demonstrating substantial robustness in adverse perceptual conditions.
引用
下载
收藏
页码:1239 / 1285
页数:46
相关论文
共 50 条
  • [21] ThreeWays to Improve Semantic Segmentation with Self-Supervised Depth Estimation
    Hoyer, Lukas
    Dai, Dengxin
    Chen, Yuhua
    Koring, Adrian
    Saha, Suman
    Van Gool, Luc
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11125 - 11135
  • [22] Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation
    Chen, Qi
    Yang, Lingxiao
    Lai, Jianhuang
    Xie, Xiaohua
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4278 - 4288
  • [23] Performance Prediction for Semantic Segmentation by a Self-Supervised Image Reconstruction Decoder
    Baer, Andreas
    Klingner, Marvin
    Loehdefink, Jonas
    Hueger, Fabian
    Schlicht, Peter
    Fingscheidt, Tim
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4398 - 4407
  • [24] Self-Supervised Monocular Depth Estimation Method for Joint Semantic Segmentation
    Song X.
    Hu H.
    Ning J.
    Liang L.
    Lu X.
    Hei X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1336 - 1347
  • [25] Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
    Islam, Ashraful
    Lundell, Ben
    Sawhney, Harpreet
    Sinha, Sudipta N.
    Morales, Peter
    Radke, Richard J.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5613 - 5622
  • [26] Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene
    Shrestha, Sulabh
    Li, Yimeng
    Kosecka, Jana
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 625 - 635
  • [27] Semantic Segmentation of Remote Sensing Images With Self-Supervised Semantic-Aware Inpainting
    He, Shuyi
    Li, Qingyong
    Liu, Yang
    Wang, Wen
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [28] Boosting Multi-Modal Unsupervised Domain Adaptation for LiDAR Semantic Segmentation by Self-Supervised Depth Completion
    Cardace, Adriano
    Conti, Andrea
    Ramirez, Pierluigi Zama
    Spezialetti, Riccardo
    Salti, Samuele
    Stefano, Luigi Di
    IEEE ACCESS, 2023, 11 : 85155 - 85164
  • [29] Self-supervised Test-Time Adaptation for Medical Image Segmentation
    Li, Hao
    Liu, Han
    Hu, Dewei
    Wang, Jiacheng
    Johnson, Hans
    Sherbini, Omar
    Gavazzi, Francesco
    D'Aiello, Russell
    Vanderver, Adeline
    Long, Jeffrey
    Paulsen, Jane
    Oguz, Ipek
    MACHINE LEARNING IN CLINICAL NEUROIMAGING, MLCN 2022, 2022, 13596 : 32 - 41
  • [30] Self-Supervised Tumor Segmentation With Sim2Real Adaptation
    Zhang, Xiaoman
    Xie, Weidi
    Huang, Chaoqin
    Zhang, Ya
    Chen, Xin
    Tian, Qi
    Wang, Yanfeng
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (09) : 4373 - 4384