Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network

被引:10
|
作者
Yu, Hongfeng [1 ,2 ,3 ]
Yao, Fanglong [1 ,2 ,3 ]
Lu, Wanxuan [1 ,3 ]
Liu, Nayu [1 ,2 ,3 ]
Li, Peiguang [1 ,2 ,3 ]
You, Hongjian [1 ,2 ,3 ]
Sun, Xian [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Network Informat Syst Technol, NIST, Beijing 100190, Peoples R China
关键词
Remote sensing; Image retrieval; Feature extraction; Semantics; Graph neural networks; Task analysis; Correlation; Cross-modal feature fusion; cross-modal remote sensing (RS) image retrieval; graph neural network (GNN); CLASSIFICATION; FEATURES; CODES;
D O I
10.1109/JSTARS.2022.3231851
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rapid development of remote sensing (RS) technology has produced massive images, which makes it difficult to obtain interpretation results by manual screening. Therefore, researchers began to develop automatic retrieval method of RS images. In recent years, cross-modal RS image retrieval based on query text has attracted many researchers because of its flexible and has become a new research trend. However, the primary problem faced is that the information of query text and RS image is not aligned. For example, RS images often have the attributes of multiscale and multiobjective, and the amount of information is rich, while the query text contains only a few words, and the information is scarce. Recently, graph neural network (GNN) has shown its potential in many tasks with its powerful feature representation ability. Therefore, based on GNN, this article proposes a new cross-modal RS feature matching network, which can avoid the degradation of retrieval performance caused by information misalignment by learning the feature interaction in query text and RS image, respectively, and modeling the feature association between the two modes. Specifically, to fuse the within-modal features, the text and RS image graph modules are designed based on GNN. In addition, in order to effectively match the query text and RS image, combined with the multihead attention mechanism, an image-text association module is constructed to focus on the parts related to RS image in the text. The experiments on two public standard datasets verify the competitive performance of the proposed model.
引用
收藏
页码:812 / 824
页数:13
相关论文
共 50 条
  • [1] MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval
    Wang, Zhiming
    Dong, Zhihua
    Yang, Xiaoyu
    Wang, Zhiguo
    Yin, Guangqiang
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 261 - 273
  • [2] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [3] Hypersphere-Based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning
    Zhang, Weihang
    Li, Jihao
    Li, Shuoke
    Chen, Jialiang
    Zhang, Wenkai
    Gao, Xin
    Sun, Xian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [4] Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
    Yuan, Zhiqiang
    Zhang, Wenkai
    Tian, Changyuan
    Rong, Xuee
    Zhang, Zhengyuan
    Wang, Hongqi
    Fu, Kun
    Sun, Xian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [5] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
    Yang, Lei
    Feng, Yong
    Zhou, Mingling
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
  • [6] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zhang, Xiong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Long
    Zhang, Haisu
    [J]. REMOTE SENSING, 2023, 15 (18)
  • [7] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
    Zhang, Shun
    Li, Yupeng
    Mei, Shaohui
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] Improving text-image cross-modal retrieval with contrastive loss
    Chumeng Zhang
    Yue Yang
    Junbo Guo
    Guoqing Jin
    Dan Song
    An An Liu
    [J]. Multimedia Systems, 2023, 29 : 569 - 575
  • [9] Improving text-image cross-modal retrieval with contrastive loss
    Zhang, Chumeng
    Yang, Yue
    Guo, Junbo
    Jin, Guoqing
    Song, Dan
    Liu, An An
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
  • [10] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297