SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

被引:0
|
作者
Zhang, Shijie [1 ,2 ]
Zhang, Bin [1 ,2 ]
Wu, Yuntao [1 ,2 ]
Zhou, Huabing [1 ,2 ]
Jiang, Junjun [3 ]
Ma, Jiayi [4 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Technol, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robots, Wuhan 430205, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[4] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Remote sensing; Semantics; Visualization; Transformers; Linguistics; Feature extraction; Sensors; Accuracy; Laser radar; Attention mechanism; contrastive language-image pretraining (CLIP); prompt learning; remote sensing; semantic segmentation;
D O I
10.1109/TGRS.2024.3487576
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing semantic segmentation is considered a key step in the intelligent interpretation of high-resolution remote sensing (HRRS) images, with widespread applications in fields such as hazard assessment, environmental monitoring, and urban planning. Recently, numerous deep learning-based semantic segmentation methods have emerged, achieving significant breakthroughs. However, the majority of current research still concentrates on representation learning in the visual feature space, with the potential of multimodal data sources yet to be fully explored. In recent years, the foundational visual language model, namely contrastive language-image pretraining (CLIP), has established a new paradigm in the visual field, demonstrating excellent generalization capabilities and deep semantic understanding across a variety of tasks. Inspired by prompt learning, we propose a prompting approach based on linguistic descriptions to enable CLIP to generate semantically distinct contextual information for remote sensing images. We introduce the SegCLIP network architecture, a novel framework specifically designed for semantic segmentation of HRRS images. Specifically, we have adapted CLIP to extract text information, thereby guiding the visual model in distinguishing among classes. Additionally, we have designed a cross-modal feature fusion (CFF) module that integrates linguistic and visual semantic features, ensuring semantic consistency across modalities. Finally, we have fully exploited the potential of text data and have used additional real text to refine ambiguous query features. Experimental evaluations confirm that the method exhibits superior performance on the LoveDA, iSAID, and UAVid public semantic segmentation datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Intelligent Optimization Learning for Semantic Segmentation of High Spatial Resolution Remote Sensing Images
    Shao Z.
    Sun Y.
    Xi J.
    Li Y.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2022, 47 (02): : 234 - 241
  • [22] SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Zhang, Xiaoyan
    Li, Linhui
    Di, Donglin
    Wang, Jian
    Chen, Guangsheng
    Jing, Weipeng
    Emam, Mahmoud
    REMOTE SENSING, 2022, 14 (19)
  • [23] DNAS: Decoupling Neural Architecture Search for High-Resolution Remote Sensing Image Semantic Segmentation
    Wang, Yu
    Li, Yansheng
    Chen, Wei
    Li, Yunzhou
    Dang, Bo
    REMOTE SENSING, 2022, 14 (16)
  • [24] SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES BASED ON SPARSE SELF-ATTENTION
    Sun, Li
    Zou, Huanxin
    Wei, Juan
    Li, Meilin
    Cao, Xu
    He, Shitian
    Liu, Shuo
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3492 - 3495
  • [25] A Semantic Segmentation Method for High-resolution Remote Sensing Images Based on Encoder-Decoder
    Yang, Jingyu
    Zhao, Liang
    Dang, Jianwu
    Wang, Yangping
    Yue, Biao
    Gu, Zongliang
    2022 TENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, CBD, 2022, : 98 - 103
  • [26] ORBNet: Original Reinforcement Bilateral Network for High-Resolution Remote Sensing Image Semantic Segmentation
    Zhang, Yijie
    Cheng, Jian
    Su, Yanzhou
    Wu, Yuheng
    Ma, Qijun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 15900 - 15913
  • [27] Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation
    Gu, Yuhang
    Hao, Jie
    Chen, Bing
    Deng, Hai
    REMOTE SENSING, 2021, 13 (20)
  • [28] A Frequency Attention-Enhanced Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Zhong, Jianyi
    Zeng, Tao
    Xu, Zhennan
    Wu, Caifeng
    Qian, Shangtuo
    Xu, Nan
    Chen, Ziqi
    Lyu, Xin
    Li, Xin
    REMOTE SENSING, 2025, 17 (03)
  • [29] Enhanced Lightweight End-to-End Semantic Segmentation for High-Resolution Remote Sensing Images
    Dong, He
    Yu, Baoguo
    Wu, Wanqing
    He, Chenglong
    IEEE ACCESS, 2022, 10 : 70947 - 70954
  • [30] EFCNet: Ensemble Full Convolutional Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Chen, Li
    Dou, Xin
    Peng, Jian
    Li, Wenbo
    Sun, Bingyu
    Li, Haifeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19