Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation

被引:0
|
作者
Lei, Sen [1 ]
Xiao, Xinyu [2 ]
Zhang, Tianlin [3 ]
Li, Heng-Chao [1 ]
Shi, Zhenwei [4 ]
Zhu, Qing [5 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 611756, Peoples R China
[2] Co Ant Grp, Hangzhou 688688, Peoples R China
[3] AVIC, Luoyang Inst Electroopt Equipment, Luoyang 471000, Peoples R China
[4] Beihang Univ, Image Proc Ctr, Sch Astronaut, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
[5] Southwest Jiaotong Univ, Fac Geosci & Engn, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing; Image segmentation; Visualization; Feature extraction; Linguistics; Transformers; Electronic mail; Adaptation models; Object recognition; Grounding; Fine-grained image-text alignment; referring image segmentation; remote sensing images; CLASSIFICATION; NETWORK;
D O I
10.1109/TGRS.2024.3522293
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Given a language expression, referring remote sensing image segmentation (RRSIS) aims to identify ground objects and assign pixelwise labels within the imagery. One of the key challenges for this task is to capture discriminative multimodal features via image-text alignment. However, the existing RRSIS methods use one vanilla and coarse alignment, where the language expression is directly extracted to be fused with the visual features. In this article, we argue that a "fine-grained image-text alignment" can improve the extraction of multimodal information. To this point, we propose a new RRSIS method to fully exploit the visual and linguistic representations. Specifically, the original referring expression is regarded as context text, which is further decoupled into the ground object and spatial position texts. The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts, obtaining better discriminative multimodal representation. Meanwhile, to handle the various scales of ground objects in remote sensing, we introduce a text-aware multiscale enhancement module (TMEM) to adaptively perform cross-scale fusion and intersections. We evaluate the effectiveness of the proposed method on two public referring remote sensing datasets including RefSegRS and RRSIS-D, and our method obtains superior performance over several state-of-the-art methods. The code will be publicly available at https://github.com/Shaosifan/FIANet.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Multitask Fine-Grained Feature Mining for Multilabel Remote Sensing Image Classification
    Guo, Jie
    Sun, Hao
    Han, Jinheng
    Song, Bin
    Chi, Yuhao
    Song, Bingxi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [32] Fine-Grained Image Recognition Methods and Their Applications in Remote Sensing Images: A Review
    Chu, Yang
    Ye, Minchao
    Qian, Yuntao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19640 - 19667
  • [33] Fine-grained parallel algorithm for remote sensing image mosaics for cluster system
    An, Xinghua
    Wang, Xiaoge
    Du, Zhihui
    Liu, Dingsheng
    Li, Guoqing
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2002, 42 (10): : 1389 - 1392
  • [34] Fine-grained damage detection of cement concrete pavement based on UAV remote sensing image segmentation and stitching
    Feng, Shuangda
    Gao, Mingxing
    Jin, Xiaowei
    Zhao, Ting
    Ang, Feng Y.
    MEASUREMENT, 2024, 226
  • [35] Fine-Grained Human Hair Segmentation Using a Text-to-Image Diffusion Model
    Kim, Dohyun
    Lee, Euna
    Yoo, Daehyun
    Lee, Hongchul
    IEEE ACCESS, 2024, 12 : 13912 - 13922
  • [36] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297
  • [37] TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
    Li, Qiqi
    Ma, Longfei
    Jiang, Zheng
    Li, Mingyong
    Jin, Bo
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3713 - 3728
  • [38] Relation-Aggregated Cross-Graph Correlation Learning for Fine-Grained Image-Text Retrieval
    Peng, Shu-Juan
    He, Yi
    Liu, Xin
    Cheung, Yiu-ming
    Xu, Xing
    Cui, Zhen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2194 - 2207
  • [39] Image-Text Matching with Fine-Grained Relational Dependency and Bidirectional Attention-Based Generative Networks
    Zhu, Jianwei
    Li, Zhixin
    Zeng, Yufei
    Wei, Jiahui
    Ma, Huifang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [40] Exploring Misclassification Information for Fine-Grained Image Classification
    Wang, Da-Han
    Zhou, Wei
    Li, Jianmin
    Wu, Yun
    Zhu, Shunzhi
    SENSORS, 2021, 21 (12)