Context-Aggregated and SAM-Guided Network for ViT-Based Instance Segmentation in Remote Sensing Images

被引:0
|
作者
Liu, Shuangzhou [1 ,2 ,3 ]
Wang, Feng [1 ,2 ]
You, Hongjian [1 ,2 ,3 ]
Jiao, Niangang [1 ,2 ]
Zhou, Guangyao [1 ,2 ]
Zhang, Tingtao [1 ,2 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Technol Geospatial Informat Proc & Applica, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 101408, Peoples R China
关键词
instance segmentation; remote sensing images; SAM; backbone;
D O I
10.3390/rs16132472
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Instance segmentation of remote sensing images can not only provide object-level positioning information but also provide pixel-level positioning information. This pixel-level information annotation has a wide range of uses in the field of remote sensing, and it is of great value for environmental detection and resource management. Because optical images generally have complex terrain environments and changeable object shapes, SAR images are affected by complex scattering phenomena, and the mask quality obtained by the traditional instance segmentation method used in remote sensing images is not high. Therefore, it is a challenging task to improve the mask quality of instance segmentation in remote sensing images. Since the traditional two-stage instance segmentation method consists of backbone, neck, bbox head, and mask head, the final mask quality depends on the product of all front-end work quality. Therefore, we consider the difficulty of optical and SAR images to bring instance segmentation to the targeted improvement of the neck, bbox head, and mask head, and we propose the Context-Aggregated and SAM-Guided Network (CSNet). In this network, the plain feature fusion pyramid network (PFFPN) can generate a pyramid for the plain feature and provide a feature map of the appropriate instance scale for detection and segmentation. The network also includes a context aggregation bbox head (CABH), which uses the context information and instance information around the instance to solve the problem of missed detection and false detection in detection. The network also has a SAM-Guided mask head (SGMH), which learns by using SAM as a teacher, and uses the knowledge learned to improve the edge of the mask. Experimental results show that CSNet significantly improves the quality of masks generated under optical and SAR images, and CSNet achieves 5.1% and 3.2% AP increments compared with other SOTA models.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] A ViT-Based Multiscale Feature Fusion Approach for Remote Sensing Image Segmentation
    Wang, Wei
    Tang, Chen
    Wang, Xin
    Zheng, Bin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [2] SFA-Net: A SAM-guided focused attention network for multimodal remote sensing image matching
    Gao, Tian
    Lan, Chaozhen
    Huang, Wenjun
    Wang, Sheng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2025, 223 : 188 - 206
  • [3] Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images
    Zhang, Tianyang
    Zhang, Xiangrong
    Zhu, Peng
    Tang, Xu
    Li, Chen
    Jiao, Licheng
    Zhou, Huiyu
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (10) : 10999 - 11013
  • [4] Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images
    Liu, Ye
    Li, Huifang
    Hu, Chao
    Luo, Shuang
    Luo, Yan
    Chen, Chang Wen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 595 - 609
  • [5] Empowering Physical Attacks With Jacobian Matrix Regularization Against ViT-Based Detectors in UAV Remote Sensing Images
    Zhang, Yu
    Gong, Zhiqiang
    Liu, Wenlin
    Wen, Hao
    Wan, Pengcheng
    Qi, Jiahao
    Hu, Xikun
    Zhong, Ping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [6] Tuning a SAM-Based Model With Multicognitive Visual Adapter to Remote Sensing Instance Segmentation
    Zheng, Linghao
    Pu, Xinyang
    Zhang, Su
    Xu, Feng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 2737 - 2748
  • [7] Global Context Parallel Attention for Anchor-Free Instance Segmentation in Remote Sensing Images
    Liu, Xinyu
    Di, Xiaoguang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] SHIP INSTANCE SEGMENTATION FROM REMOTE SENSING IMAGES USING SEQUENCE LOCAL CONTEXT MODULE
    Feng, Yingchao
    Diao, Wenhui
    Zhang, Yi
    Li, Hao
    Chang, Zhonghan
    Yan, Menglong
    Sun, Xian
    Gao, Xin
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 1025 - 1028
  • [9] Edge Guided Context Aggregation Network for Semantic Segmentation of Remote Sensing Imagery
    Liu, Zhiqiang
    Li, Jiaojiao
    Song, Rui
    Wu, Chaoxiong
    Liu, Wei
    Li, Zan
    Li, Yunsong
    REMOTE SENSING, 2022, 14 (06)
  • [10] Hidden Feature-Guided Semantic Segmentation Network for Remote Sensing Images
    Wang, Zhen
    Zhang, Shanwen
    Zhang, Chuanlei
    Wang, Buhong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61