Multi-LoRA Fine-Tuned Segment Anything Model for Urban Man-Made Object Extraction

被引：0

作者：

Lu, Xiaoyan ^{[1
,2
]}

Weng, Qihao ^{[1
,3
,4
]}

机构：

[1] Hong Kong Polytech Univ, Dept Land Surveying & Geoinformat, JC STEM Lab Earth Observat, Hong Kong, Peoples R China

[2] Hong Kong Polytech Univ, Res Ctr Artificial Intelligence Geomatics, Hong Kong, Peoples R China

[3] Hong Kong Polytech Univ, Res Ctr Artificial Intelligence Geomatics, Hong Kong, Peoples R China

[4] Hong Kong Polytech Univ, Res Inst Land & Space, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

关键词：

Adaptation models; Remote sensing; Transformers; Task analysis; Urban areas; Roads; Computational modeling; High-resolution satellite imagery; man-made objects; segment anything model (SAM); unsupervised domain adaptation (UDA); urban areas; DOMAIN ROAD DETECTION; LEARNING FRAMEWORK;

D O I：

10.1109/TGRS.2024.3435745

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Mapping urban man-made objects, such as roads and buildings, from high-resolution remote sensing imagery is an essential need for monitoring global urbanization. However, the generalization ability of most existing models is limited due to the inconsistent data distribution of images across different regions. The emergence of the segment anything model (SAM) has significantly advanced image segmentation, primarily attributed to its strong zero-shot segmentation ability. However, SAM tends to underperform in various remote sensing tasks, such as road and building extraction, primarily due to the complexity of remote sensing imagery. This article introduced the multi-LoRA fine-tuned SAM (SAM_MLoRAF) framework, a simple yet effective network designed to extract urban man-made objects, which injected multiple parallel low-rank LoRA structures into the SAM encoder to approximate a high-rank LoRA, effectively mitigating the overfitting problem. In addition, it adopted a pyramid decoder to integrate multilevel information. For model optimization, supervised and unsupervised fine-tuning strategies were employed. Initially, the SAM_MLoRA was trained on publicly available datasets in a supervised manner to adapt to the task of urban man-made object extraction. In the second step, based on the idea of consistency regularization, unsupervised fine-tuning was employed to adapt the model to the target region by leveraging unlabeled images from the target region. Extensive experiments conducted on five continents have demonstrated that the proposed SAM_MLoRAF framework can efficiently leverage the robust segmentation capabilities of the SAM foundation model with a few trainable parameters, and most intersections over union (IoUs) of the mapping performance improved by over 10% compared to previous segmentation models. The code and datasets will be released at: https://github.com/xiaoyan07/SAM_MLoRA.

引用

页数：19