MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval

被引：0

作者：

Wang, Zhiming ^{[1
]}

Dong, Zhihua ^{[1
]}

Yang, Xiaoyu ^{[1
]}

Wang, Zhiguo ^{[1
]}

Yin, Guangqiang ^{[1
,2
,3
,4
]}

机构：

[1] Univ Elect Sci & Technol China, Chengdu 611730, Peoples R China

[2] UESTC, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China

[3] Kashi Inst Elect & Informat Ind, Kashi 844199, Peoples R China

[4] Univ Elect Sci & Technol China, Kashi 844199, Peoples R China

来源：

PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023 | 2024年 / 1126卷

关键词：

Multimodal graph attention network (MGAN); Cross-modal remote sensing (RS) text-image retrieval; Visual graph neural network;

D O I：

10.1007/978-981-99-9243-0_27

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Cross-modal text image retrieval in remote sensing is a crucial task that requires the development of unified visual and textual representations. Previous research has primarily focused on global information or object features extracted through object detection algorithms to obtain local information. However, these studies have overlooked the complexity of remote sensing images, leading to insufficient utilization of local information. To address this issue, we propose the Multimodal Graph Attention Network (MGAN), which is based on visual graph neural networks. Our MGAN architecture includes a multi-level node information fusion module that utilizes different levels of object features to generate local information, compensate for the limitations of global information, and produce more expressive visual features. Additionally, we incorporate visual features into our model to guide the generation of text features, considering the correlation between text and objects in remote sensing images. We conduct extensive experiments on the RSITMD dataset, demonstrating that our method outperforms state-of-the-art methods by a margin of 2.27% in mR.

引用

页码：261 / 273

页数：13

共 50 条

[1] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
Yu, Hongfeng
Yao, Fanglong
Lu, Wanxuan
Liu, Nayu
Li, Peiguang
You, Hongjian
Sun, Xian
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
[2] Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
Yuan, Zhiqiang
Zhang, Wenkai
Tian, Changyuan
Rong, Xuee
Zhang, Zhengyuan
Wang, Hongqi
Fu, Kun
Sun, Xian
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[3] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
Yang, Lei
Feng, Yong
Zhou, Mingling
Xiong, Xiancai
Wang, Yongheng
Qiang, Baohua
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
[4] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
Ji, Zhong
Wang, Haoran
Han, Jungong
Pang, Yanwei
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
[5] Hypersphere-Based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning
Zhang, Weihang
Li, Jihao
Li, Shuoke
Chen, Jialiang
Zhang, Wenkai
Gao, Xin
Sun, Xian
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[6] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
Zhang, Xiong
Li, Weipeng
Wang, Xu
Wang, Luyao
Zheng, Fuzhong
Wang, Long
Zhang, Haisu
[J]. REMOTE SENSING, 2023, 15 (18)
[7] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
Zhang, Shun
Li, Yupeng
Mei, Shaohui
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[8] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
Zheng, Juncheng
Liang, Meiyu
Yu, Yang
Du, Junping
Xue, Zhe
[J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
[9] A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing
Zheng, Fuzhong
Li, Weipeng
Wang, Xu
Wang, Luyao
Zhang, Xiong
Zhang, Haisu
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (23):
[10] Improving text-image cross-modal retrieval with contrastive loss
Chumeng Zhang
Yue Yang
Junbo Guo
Guoqing Jin
Dan Song
An An Liu
[J]. Multimedia Systems, 2023, 29 : 569 - 575

← 1 2 3 4 5 →