Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

被引:7
|
作者
Ji, Wei [1 ]
Li, Jingjing [1 ]
Bian, Cheng [2 ]
Zhou, Zongwei [3 ]
Zhao, Jiaying [2 ]
Yuille, Alan [3 ]
Cheng, Li [1 ]
机构
[1] Univ Alberta, Edmonton, AB T6G 2M7, Canada
[2] ByteDance, Beijing, Peoples R China
[3] Johns Hopkins Univ, Baltimore, MD USA
基金
加拿大自然科学与工程研究理事会;
关键词
ATTENTION; NETWORK; FUSION;
D O I
10.1109/CVPR52729.2023.00112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robust and reliable semantic segmentation in complex scenes is crucial for many real-life applications such as autonomous safe driving and nighttime rescue. In most approaches, it is typical to make use of RGB images as input. They however work well only in preferred weather conditions; when facing adverse conditions such as rainy, overexposure, or low-light, they often fail to deliver satisfactory results. This has led to the recent investigation into multispectral semantic segmentation, where RGB and thermal infrared (RGBT) images are both utilized as input. This gives rise to significantly more robust segmentation of image objects in complex scenes and under adverse conditions. Nevertheless, the present focus in single RGBT image input restricts existing methods from well addressing dynamic real-world scenes. Motivated by the above observations, in this paper, we set out to address a relatively new task of semantic segmentation of multispectral video input, which we refer to as Multispectral Video Semantic Segmentation, or MVSS in short. An in-house MVSeg dataset is thus curated, consisting of 738 calibrated RGB and thermal videos, accompanied by 3,545 fine-grained pixel-level semantic annota- tions of 26 categories. Our dataset contains a wide range of challenging urban scenes in both daytime and nighttime. Moreover, we propose an effective MVSS baseline, dubbed MVNet, which is to our knowledge the first model to jointly learn semantic representations from multispectral and temporal contexts. Comprehensive experiments are conducted using various semantic segmentation models on the MVSeg dataset. Empirically, the engagement of multispectral video input is shown to lead to significant improvement in semantic segmentation; the effectiveness of our MVNet baseline has also been verified.
引用
收藏
页码:1094 / 1104
页数:11
相关论文
共 50 条
  • [41] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366
  • [42] Dataset for flood area recognition with semantic segmentation
    Intizhami, Naili Suri
    Nuranti, Eka Qadri
    Bahar, Nur Inaya
    [J]. DATA IN BRIEF, 2023, 51
  • [43] UAVid: A semantic segmentation dataset for UAV imagery
    Lyu, Ye
    Vosselman, George
    Xia, Gui-Song
    Yilmaz, Alper
    Yang, Michael Ying
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 165 : 108 - 119
  • [44] ColorWater: A Diverse Dataset and Benchmark for Semantic Water Surface Understanding
    Liang, Cuixiao
    Cai, Wenjie
    Liu, Qiong
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3743 - 3749
  • [45] Conditional GANs for Semantic Segmentation of Multispectral Satellite Images
    Kniaz, Vladimir V.
    [J]. IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIV, 2018, 10789
  • [46] Multispectral Semantic Segmentation for Land Cover Classification: An Overview
    Ramos, Leo Thomas
    Sappa, Angel D.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 14295 - 14336
  • [47] Deep Semantic Segmentation of Trees Using Multispectral Images
    Ulku, Irem
    Akagunduz, Erdem
    Ghamisi, Pedram
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 7589 - 7604
  • [48] Performance Analysis of Semantic Segmentation Algorithms for Finely Annotated New UAV Aerial Video Dataset (ManipalUAVid)
    Girisha, S.
    Pai, Manohara M. M.
    Verma, Ujjwal
    Pai, Radhika M.
    [J]. IEEE ACCESS, 2019, 7 : 136239 - 136253
  • [49] SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation - A Synthetic Dataset and Baselines
    Hu, Yuan-Ting X.
    Chen, Hong-Shuo
    Hui, Kexin
    Huang, Jia-Bin
    Schwing, Alexander
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3100 - 3110
  • [50] Semantic segmentation on Swiss3DCities: A benchmark study on aerial photogrammetric 3D pointcloud dataset
    Can, Gulcan
    Mantegazza, Dario
    Abbate, Gabriele
    Chappuis, Sebastien
    Giusti, Alessandro
    [J]. PATTERN RECOGNITION LETTERS, 2021, 150 : 108 - 114