SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

被引:0
|
作者
Zhang, Shijie [1 ,2 ]
Zhang, Bin [1 ,2 ]
Wu, Yuntao [1 ,2 ]
Zhou, Huabing [1 ,2 ]
Jiang, Junjun [3 ]
Ma, Jiayi [4 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Technol, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robots, Wuhan 430205, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[4] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Remote sensing; Semantics; Visualization; Transformers; Linguistics; Feature extraction; Sensors; Accuracy; Laser radar; Attention mechanism; contrastive language-image pretraining (CLIP); prompt learning; remote sensing; semantic segmentation;
D O I
10.1109/TGRS.2024.3487576
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing semantic segmentation is considered a key step in the intelligent interpretation of high-resolution remote sensing (HRRS) images, with widespread applications in fields such as hazard assessment, environmental monitoring, and urban planning. Recently, numerous deep learning-based semantic segmentation methods have emerged, achieving significant breakthroughs. However, the majority of current research still concentrates on representation learning in the visual feature space, with the potential of multimodal data sources yet to be fully explored. In recent years, the foundational visual language model, namely contrastive language-image pretraining (CLIP), has established a new paradigm in the visual field, demonstrating excellent generalization capabilities and deep semantic understanding across a variety of tasks. Inspired by prompt learning, we propose a prompting approach based on linguistic descriptions to enable CLIP to generate semantically distinct contextual information for remote sensing images. We introduce the SegCLIP network architecture, a novel framework specifically designed for semantic segmentation of HRRS images. Specifically, we have adapted CLIP to extract text information, thereby guiding the visual model in distinguishing among classes. Additionally, we have designed a cross-modal feature fusion (CFF) module that integrates linguistic and visual semantic features, ensuring semantic consistency across modalities. Finally, we have fully exploited the potential of text data and have used additional real text to refine ambiguous query features. Experimental evaluations confirm that the method exhibits superior performance on the LoveDA, iSAID, and UAVid public semantic segmentation datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] VTPL: Visual and text prompt learning for visual-language models
    Sun, Bo
    Wu, Zhichao
    Zhang, Hao
    He, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [2] UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
    Chang, Zhanyuan
    Xu, Mingyu
    Wei, Yuwen
    Lian, Jie
    Zhang, Chongming
    Li, Chuanjiang
    SENSORS, 2024, 24 (20)
  • [3] Dual decoupling semantic segmentation model for high-resolution remote sensing images
    Liu S.
    Li X.
    Yu M.
    Xing G.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (04): : 638 - 647
  • [4] Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation
    Zhang, Hua
    Jiang, Zhengang
    Xu, Jun
    Pan, Xin
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6860 - 6883
  • [5] SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES USING AN IMPROVED TRANSFORMER
    Liu, Yuheng
    Mei, Shaohui
    Zhang, Shun
    Wang, Ye
    He, Mingyi
    Du, Qian
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3496 - 3499
  • [6] A Deformable Attention Network for High-Resolution Remote Sensing Images Semantic Segmentation
    Zuo, Renxiang
    Zhang, Guangyun
    Zhang, Rongting
    Jia, Xiuping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [7] Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images
    Guo, Shichen
    Yang, Qi
    Xiang, Shiming
    Wang, Pengfei
    Wang, Xuezhi
    REMOTE SENSING, 2023, 15 (09)
  • [8] Edge Guidance Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Ni, Yue
    Liu, Jiahang
    Cui, Jian
    Yang, Yuze
    Wang, Xiaozhen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 9809 - 9822
  • [9] Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
    Zhang, Xiaolu
    Wang, Zhaoshun
    Wei, Anlei
    CANADIAN JOURNAL OF REMOTE SENSING, 2023, 49 (01)
  • [10] LMFNet: Lightweight Multimodal Fusion Network for high-resolution remote sensing image segmentation
    Wang, Tong
    Chen, Guanzhou
    Zhang, Xiaodong
    Liu, Chenxi
    Wang, Jiaqi
    Tan, Xiaoliang
    Zhou, Wenlin
    He, Chanjuan
    PATTERN RECOGNITION, 2025, 164