Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network

被引:10
|
作者
Yu, Hongfeng [1 ,2 ,3 ]
Yao, Fanglong [1 ,2 ,3 ]
Lu, Wanxuan [1 ,3 ]
Liu, Nayu [1 ,2 ,3 ]
Li, Peiguang [1 ,2 ,3 ]
You, Hongjian [1 ,2 ,3 ]
Sun, Xian [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Network Informat Syst Technol, NIST, Beijing 100190, Peoples R China
关键词
Remote sensing; Image retrieval; Feature extraction; Semantics; Graph neural networks; Task analysis; Correlation; Cross-modal feature fusion; cross-modal remote sensing (RS) image retrieval; graph neural network (GNN); CLASSIFICATION; FEATURES; CODES;
D O I
10.1109/JSTARS.2022.3231851
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rapid development of remote sensing (RS) technology has produced massive images, which makes it difficult to obtain interpretation results by manual screening. Therefore, researchers began to develop automatic retrieval method of RS images. In recent years, cross-modal RS image retrieval based on query text has attracted many researchers because of its flexible and has become a new research trend. However, the primary problem faced is that the information of query text and RS image is not aligned. For example, RS images often have the attributes of multiscale and multiobjective, and the amount of information is rich, while the query text contains only a few words, and the information is scarce. Recently, graph neural network (GNN) has shown its potential in many tasks with its powerful feature representation ability. Therefore, based on GNN, this article proposes a new cross-modal RS feature matching network, which can avoid the degradation of retrieval performance caused by information misalignment by learning the feature interaction in query text and RS image, respectively, and modeling the feature association between the two modes. Specifically, to fuse the within-modal features, the text and RS image graph modules are designed based on GNN. In addition, in order to effectively match the query text and RS image, combined with the multihead attention mechanism, an image-text association module is constructed to focus on the parts related to RS image in the text. The experiments on two public standard datasets verify the competitive performance of the proposed model.
引用
收藏
页码:812 / 824
页数:13
相关论文
共 50 条
  • [21] Cross-modal alignment with graph reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Gao, Junbin
    Yin, Baocai
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632
  • [22] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
    Yang, Rui
    Zhang, Di
    Guo, YanHe
    Wang, Shuang
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
  • [23] Cross-modal alignment with graph reasoning for image-text retrieval
    Zheng Cui
    Yongli Hu
    Yanfeng Sun
    Junbin Gao
    Baocai Yin
    [J]. Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
  • [24] Robust Cross-Modal Remote Sensing Image Retrieval via Maximal Correlation Augmentation
    Wang, Zhuoyue
    Wang, Xueqian
    Li, Gang
    Li, Chengxi
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [25] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
    Zheng, Juncheng
    Liang, Meiyu
    Yu, Yang
    Du, Junping
    Xue, Zhe
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
  • [26] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    [J]. REMOTE SENSING, 2024, 16 (01)
  • [27] Masking-Based Cross-Modal Remote Sensing Image-Text Retrieval via Dynamic Contrastive Learning
    Zhao, Zuopeng
    Miao, Xiaoran
    He, Chen
    Hu, Jianfeng
    Min, Bingbing
    Gao, Yumeng
    Liu, Ying
    Pharksuwan, Kanyaphakphachsorn
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [28] Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval
    Wei, Yuqi
    Li, Ning
    [J]. Computer Engineering and Applications, 2023, 59 (16) : 115 - 124
  • [29] Cross-modal Semantically Augmented Network for Image-text Matching
    Yao, Tao
    Li, Yiru
    Li, Ying
    Zhu, Yingying
    Wang, Gang
    Yue, Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [30] Learning Text-image Joint Embedding for Efficient Cross-modal Retrieval with Deep Feature Engineering
    Xie, Zhongwei
    Liu, Ling
    Wu, Yanzhao
    Zhong, Luo
    Li, Lin
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (04)