SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

被引:0
|
作者
Zhang, Shijie [1 ,2 ]
Zhang, Bin [1 ,2 ]
Wu, Yuntao [1 ,2 ]
Zhou, Huabing [1 ,2 ]
Jiang, Junjun [3 ]
Ma, Jiayi [4 ]
机构
[1] Wuhan Inst Technol, Sch Comp Sci & Technol, Wuhan 430205, Peoples R China
[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robots, Wuhan 430205, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[4] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Remote sensing; Semantics; Visualization; Transformers; Linguistics; Feature extraction; Sensors; Accuracy; Laser radar; Attention mechanism; contrastive language-image pretraining (CLIP); prompt learning; remote sensing; semantic segmentation;
D O I
10.1109/TGRS.2024.3487576
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing semantic segmentation is considered a key step in the intelligent interpretation of high-resolution remote sensing (HRRS) images, with widespread applications in fields such as hazard assessment, environmental monitoring, and urban planning. Recently, numerous deep learning-based semantic segmentation methods have emerged, achieving significant breakthroughs. However, the majority of current research still concentrates on representation learning in the visual feature space, with the potential of multimodal data sources yet to be fully explored. In recent years, the foundational visual language model, namely contrastive language-image pretraining (CLIP), has established a new paradigm in the visual field, demonstrating excellent generalization capabilities and deep semantic understanding across a variety of tasks. Inspired by prompt learning, we propose a prompting approach based on linguistic descriptions to enable CLIP to generate semantically distinct contextual information for remote sensing images. We introduce the SegCLIP network architecture, a novel framework specifically designed for semantic segmentation of HRRS images. Specifically, we have adapted CLIP to extract text information, thereby guiding the visual model in distinguishing among classes. Additionally, we have designed a cross-modal feature fusion (CFF) module that integrates linguistic and visual semantic features, ensuring semantic consistency across modalities. Finally, we have fully exploited the potential of text data and have used additional real text to refine ambiguous query features. Experimental evaluations confirm that the method exhibits superior performance on the LoveDA, iSAID, and UAVid public semantic segmentation datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Semantic Segmentation of High-Resolution Remote Sensing Images Using Multiscale Skip Connection Network
    Ma, Bifang
    Chang, Chih-Yung
    IEEE SENSORS JOURNAL, 2022, 22 (04) : 3745 - 3755
  • [32] MFRNet: A Multipath Feature Refinement Network for Semantic Segmentation in High-Resolution Remote Sensing Images
    Xiao, Tao
    Liu, Yikun
    Huang, Yuwen
    Yang, Gongping
    REMOTE SENSING LETTERS, 2022, 13 (12) : 1271 - 1283
  • [33] Fully convolutional DenseNet with adversarial training for semantic segmentation of high-resolution remote sensing images
    Guo, Xuejun
    Chen, Zehua
    Wang, Chengyi
    JOURNAL OF APPLIED REMOTE SENSING, 2021, 15 (01)
  • [34] Spatial-specific Transformer with involution for semantic segmentation of high-resolution remote sensing images
    Wu, Xinjia
    Zhang, Jing
    Li, Wensheng
    Li, Jiafeng
    Zhuo, Li
    Zhang, Jie
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (04) : 1280 - 1307
  • [35] FSegNet: A Semantic Segmentation Network for High-Resolution Remote Sensing Images That Balances Efficiency and Performance
    Luo, Wen
    Deng, Fei
    Jiang, Peifan
    Dong, Xiujun
    Zhang, Gulan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [36] Enhanced Lightweight End-to-End Semantic Segmentation for High-Resolution Remote Sensing Images
    Dong, He
    Yu, Baoguo
    Wu, Wanqing
    He, Chenglong
    IEEE Access, 2022, 10 : 70947 - 70954
  • [37] HCANet: A Hierarchical Context Aggregation Network for Semantic Segmentation of High-Resolution Remote Sensing Images
    Bai, Haiwei
    Cheng, Jian
    Huang, Xia
    Liu, Siyu
    Deng, Changjian
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [38] Global Multi-Attention UResNeXt for Semantic Segmentation of High-Resolution Remote Sensing Images
    Chen, Zhong
    Zhao, Jun
    Deng, He
    REMOTE SENSING, 2023, 15 (07)
  • [39] RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images
    Liu, Runrui
    Tao, Fei
    Liu, Xintao
    Na, Jiaming
    Leng, Hongjun
    Wu, Junjie
    Zhou, Tong
    REMOTE SENSING, 2022, 14 (13)
  • [40] LIGHT-WEIGHT ATTENTION SEMANTIC SEGMENTATION NETWORK FOR HIGH-RESOLUTION REMOTE SENSING IMAGES
    Liu, Siyu
    He, Changtao
    Bai, Haiwei
    Zhang, Yijie
    Cheng, Jian
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2595 - 2598