Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

被引:7
|
作者
Ji, Wei [1 ]
Li, Jingjing [1 ]
Bian, Cheng [2 ]
Zhou, Zongwei [3 ]
Zhao, Jiaying [2 ]
Yuille, Alan [3 ]
Cheng, Li [1 ]
机构
[1] Univ Alberta, Edmonton, AB T6G 2M7, Canada
[2] ByteDance, Beijing, Peoples R China
[3] Johns Hopkins Univ, Baltimore, MD USA
基金
加拿大自然科学与工程研究理事会;
关键词
ATTENTION; NETWORK; FUSION;
D O I
10.1109/CVPR52729.2023.00112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robust and reliable semantic segmentation in complex scenes is crucial for many real-life applications such as autonomous safe driving and nighttime rescue. In most approaches, it is typical to make use of RGB images as input. They however work well only in preferred weather conditions; when facing adverse conditions such as rainy, overexposure, or low-light, they often fail to deliver satisfactory results. This has led to the recent investigation into multispectral semantic segmentation, where RGB and thermal infrared (RGBT) images are both utilized as input. This gives rise to significantly more robust segmentation of image objects in complex scenes and under adverse conditions. Nevertheless, the present focus in single RGBT image input restricts existing methods from well addressing dynamic real-world scenes. Motivated by the above observations, in this paper, we set out to address a relatively new task of semantic segmentation of multispectral video input, which we refer to as Multispectral Video Semantic Segmentation, or MVSS in short. An in-house MVSeg dataset is thus curated, consisting of 738 calibrated RGB and thermal videos, accompanied by 3,545 fine-grained pixel-level semantic annota- tions of 26 categories. Our dataset contains a wide range of challenging urban scenes in both daytime and nighttime. Moreover, we propose an effective MVSS baseline, dubbed MVNet, which is to our knowledge the first model to jointly learn semantic representations from multispectral and temporal contexts. Comprehensive experiments are conducted using various semantic segmentation models on the MVSeg dataset. Empirically, the engagement of multispectral video input is shown to lead to significant improvement in semantic segmentation; the effectiveness of our MVNet baseline has also been verified.
引用
收藏
页码:1094 / 1104
页数:11
相关论文
共 50 条
  • [1] Multispectral Semantic Segmentation for UAVs: A Benchmark Dataset and Baseline
    Li, Qiusheng
    Yuan, Hang
    Fu, Tianning
    Yu, Zhibin
    Zheng, Bing
    Chen, Shuguo
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [2] Multispectral Pedestrian Detection: Benchmark Dataset and Baseline
    Hwang, Soonmin
    Park, Jaesik
    Kim, Namil
    Choi, Yukyung
    Kweon, In So
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1037 - 1045
  • [3] Multispectral Benchmark Dataset and Baseline for Forklift Collision Avoidance
    Kim, Hyeongjun
    Kim, Taejoo
    Jo, Won
    Kim, Jiwon
    Shin, Jeongmin
    Han, Daechan
    Hwang, Yujin
    Choi, Yukyung
    [J]. SENSORS, 2022, 22 (20)
  • [4] Semantic Segmentation of Underwater Imagery: Dataset and Benchmark
    Islam, Md Jahidul
    Edge, Chelsey
    Xiao, Yuyang
    Luo, Peigen
    Mehtaz, Muntaqim
    Morse, Christopher
    Enan, Sadman Sakib
    Sattar, Junaed
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 1769 - 1776
  • [5] A pothole video dataset for semantic segmentation
    Ihsan, Muhammad
    Amrizal, Muhammad Alfian
    Harjoko, Agus
    [J]. DATA IN BRIEF, 2024, 53
  • [6] A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
    Perazzi, F.
    Pont-Tuset, J.
    McWilliams, B.
    Van Gool, L.
    Gross, M.
    Sorkine-Hornung, A.
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 724 - 732
  • [7] TOWARDS A BENCHMARK EO SEMANTIC SEGMENTATION DATASET FOR UNCERTAINTY QUANTIFICATION
    Wasif, Dawood
    Wang, Yuanyuan
    Shahzad, Muhammad
    Triebel, Rudolph
    Zhu, Xiao Xiang
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5018 - 5021
  • [8] A Multitask Benchmark Dataset for Satellite Video: Object Detection, Tracking, and Segmentation
    Li, Shengyang
    Zhou, Zhuang
    Zhao, Manqi
    Yang, Jian
    Guo, Weilong
    Lv, Yixuan
    Kou, Longxuan
    Wang, Han
    Gu, Yanfeng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [9] Video Colorization Dataset and Benchmark
    Abeysinghe, Chamath
    Wijesinghe, Thejan
    Wijayakoon, Chanuka
    Jayathilake, Lahiru
    Thayasivam, Uthayasanker
    [J]. 2019 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) / 5TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2019, : 37 - 42
  • [10] SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images
    Ji, Wei
    Li, Jingjing
    Bian, Cheng
    Zhang, Zhicheng
    Cheng, Li
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3307 - 3316