Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network

被引：10

作者：

Yu, Hongfeng ^{[1
,2
,3
]}

Yao, Fanglong ^{[1
,2
,3
]}

Lu, Wanxuan ^{[1
,3
]}

Liu, Nayu ^{[1
,2
,3
]}

Li, Peiguang ^{[1
,2
,3
]}

You, Hongjian ^{[1
,2
,3
]}

Sun, Xian ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100190, Peoples R China

[3] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Network Informat Syst Technol, NIST, Beijing 100190, Peoples R China

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2023年 / 16卷

关键词：

Remote sensing; Image retrieval; Feature extraction; Semantics; Graph neural networks; Task analysis; Correlation; Cross-modal feature fusion; cross-modal remote sensing (RS) image retrieval; graph neural network (GNN); CLASSIFICATION; FEATURES; CODES;

D O I：

10.1109/JSTARS.2022.3231851

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The rapid development of remote sensing (RS) technology has produced massive images, which makes it difficult to obtain interpretation results by manual screening. Therefore, researchers began to develop automatic retrieval method of RS images. In recent years, cross-modal RS image retrieval based on query text has attracted many researchers because of its flexible and has become a new research trend. However, the primary problem faced is that the information of query text and RS image is not aligned. For example, RS images often have the attributes of multiscale and multiobjective, and the amount of information is rich, while the query text contains only a few words, and the information is scarce. Recently, graph neural network (GNN) has shown its potential in many tasks with its powerful feature representation ability. Therefore, based on GNN, this article proposes a new cross-modal RS feature matching network, which can avoid the degradation of retrieval performance caused by information misalignment by learning the feature interaction in query text and RS image, respectively, and modeling the feature association between the two modes. Specifically, to fuse the within-modal features, the text and RS image graph modules are designed based on GNN. In addition, in order to effectively match the query text and RS image, combined with the multihead attention mechanism, an image-text association module is constructed to focus on the parts related to RS image in the text. The experiments on two public standard datasets verify the competitive performance of the proposed model.

引用

页码：812 / 824

页数：13

共 50 条

[21] Cross-modal alignment with graph reasoning for image-text retrieval
Cui, Zheng
Hu, Yongli
Sun, Yanfeng
Gao, Junbin
Yin, Baocai
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632
[22] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
Yang, Rui
Zhang, Di
Guo, YanHe
Wang, Shuang
[J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
[23] Cross-modal alignment with graph reasoning for image-text retrieval
Zheng Cui
Yongli Hu
Yanfeng Sun
Junbin Gao
Baocai Yin
[J]. Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
[24] Robust Cross-Modal Remote Sensing Image Retrieval via Maximal Correlation Augmentation
Wang, Zhuoyue
Wang, Xueqian
Li, Gang
Li, Chengxi
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[25] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
Zheng, Juncheng
Liang, Meiyu
Yu, Yang
Du, Junping
Xue, Zhe
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
[26] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
Li, Zhengxin
Zhao, Wenzhe
Du, Xuanyi
Zhou, Guangyao
Zhang, Songlin
[J]. REMOTE SENSING, 2024, 16 (01)
[27] Masking-Based Cross-Modal Remote Sensing Image-Text Retrieval via Dynamic Contrastive Learning
Zhao, Zuopeng
Miao, Xiaoran
He, Chen
Hu, Jianfeng
Min, Bingbing
Gao, Yumeng
Liu, Ying
Pharksuwan, Kanyaphakphachsorn
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[28] Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval
Wei, Yuqi
Li, Ning
[J]. Computer Engineering and Applications, 2023, 59 (16) : 115 - 124
[29] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[30] Learning Text-image Joint Embedding for Efficient Cross-modal Retrieval with Deep Feature Engineering
Xie, Zhongwei
Liu, Ling
Wu, Yanzhao
Zhong, Luo
Li, Lin
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (04)

← 1 2 3 4 5 →