SegCLIP: Multimodal Visual-Language and Prompt Learning for High-Resolution Remote Sensing Semantic Segmentation

被引：0

作者：

Zhang, Shijie ^{[1
,2
]}

Zhang, Bin ^{[1
,2
]}

Wu, Yuntao ^{[1
,2
]}

Zhou, Huabing ^{[1
,2
]}

Jiang, Junjun ^{[3
]}

Ma, Jiayi ^{[4
]}

机构：

[1] Wuhan Inst Technol, Sch Comp Sci & Technol, Wuhan 430205, Peoples R China

[2] Wuhan Inst Technol, Hubei Prov Key Lab Intelligent Robots, Wuhan 430205, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China

[4] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Remote sensing; Semantics; Visualization; Transformers; Linguistics; Feature extraction; Sensors; Accuracy; Laser radar; Attention mechanism; contrastive language-image pretraining (CLIP); prompt learning; remote sensing; semantic segmentation;

D O I：

10.1109/TGRS.2024.3487576

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Remote sensing semantic segmentation is considered a key step in the intelligent interpretation of high-resolution remote sensing (HRRS) images, with widespread applications in fields such as hazard assessment, environmental monitoring, and urban planning. Recently, numerous deep learning-based semantic segmentation methods have emerged, achieving significant breakthroughs. However, the majority of current research still concentrates on representation learning in the visual feature space, with the potential of multimodal data sources yet to be fully explored. In recent years, the foundational visual language model, namely contrastive language-image pretraining (CLIP), has established a new paradigm in the visual field, demonstrating excellent generalization capabilities and deep semantic understanding across a variety of tasks. Inspired by prompt learning, we propose a prompting approach based on linguistic descriptions to enable CLIP to generate semantically distinct contextual information for remote sensing images. We introduce the SegCLIP network architecture, a novel framework specifically designed for semantic segmentation of HRRS images. Specifically, we have adapted CLIP to extract text information, thereby guiding the visual model in distinguishing among classes. Additionally, we have designed a cross-modal feature fusion (CFF) module that integrates linguistic and visual semantic features, ensuring semantic consistency across modalities. Finally, we have fully exploited the potential of text data and have used additional real text to refine ambiguous query features. Experimental evaluations confirm that the method exhibits superior performance on the LoveDA, iSAID, and UAVid public semantic segmentation datasets.

引用

页数：16

共 50 条

[1] VTPL: Visual and text prompt learning for visual-language models
Sun, Bo
Wu, Zhichao
Zhang, Hao
He, Jun
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
[2] UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
Chang, Zhanyuan
Xu, Mingyu
Wei, Yuwen
Lian, Jie
Zhang, Chongming
Li, Chuanjiang
SENSORS, 2024, 24 (20)
[3] Dual decoupling semantic segmentation model for high-resolution remote sensing images
Liu S.
Li X.
Yu M.
Xing G.
Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (04): : 638 - 647
[4] Advancing high-resolution remote sensing: a compact and powerful approach to semantic segmentation
Zhang, Hua
Jiang, Zhengang
Xu, Jun
Pan, Xin
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6860 - 6883
[5] SEMANTIC SEGMENTATION OF HIGH-RESOLUTION REMOTE SENSING IMAGES USING AN IMPROVED TRANSFORMER
Liu, Yuheng
Mei, Shaohui
Zhang, Shun
Wang, Ye
He, Mingyi
Du, Qian
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 3496 - 3499
[6] A Deformable Attention Network for High-Resolution Remote Sensing Images Semantic Segmentation
Zuo, Renxiang
Zhang, Guangyun
Zhang, Rongting
Jia, Xiuping
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[7] Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images
Guo, Shichen
Yang, Qi
Xiang, Shiming
Wang, Pengfei
Wang, Xuezhi
REMOTE SENSING, 2023, 15 (09)
[8] Edge Guidance Network for Semantic Segmentation of High-Resolution Remote Sensing Images
Ni, Yue
Liu, Jiahang
Cui, Jian
Yang, Yuze
Wang, Xiaozhen
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 9809 - 9822
[9] Multiscale Cascaded Network for the Semantic Segmentation of High-Resolution Remote Sensing Images
Zhang, Xiaolu
Wang, Zhaoshun
Wei, Anlei
CANADIAN JOURNAL OF REMOTE SENSING, 2023, 49 (01)
[10] LMFNet: Lightweight Multimodal Fusion Network for high-resolution remote sensing image segmentation
Wang, Tong
Chen, Guanzhou
Zhang, Xiaodong
Liu, Chenxi
Wang, Jiaqi
Tan, Xiaoliang
Zhou, Wenlin
He, Chanjuan
PATTERN RECOGNITION, 2025, 164

← 1 2 3 4 5 →