MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval

被引:0
|
作者
Wang, Zhiming [1 ]
Dong, Zhihua [1 ]
Yang, Xiaoyu [1 ]
Wang, Zhiguo [1 ]
Yin, Guangqiang [1 ,2 ,3 ,4 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611730, Peoples R China
[2] UESTC, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Kashi Inst Elect & Informat Ind, Kashi 844199, Peoples R China
[4] Univ Elect Sci & Technol China, Kashi 844199, Peoples R China
关键词
Multimodal graph attention network (MGAN); Cross-modal remote sensing (RS) text-image retrieval; Visual graph neural network;
D O I
10.1007/978-981-99-9243-0_27
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-modal text image retrieval in remote sensing is a crucial task that requires the development of unified visual and textual representations. Previous research has primarily focused on global information or object features extracted through object detection algorithms to obtain local information. However, these studies have overlooked the complexity of remote sensing images, leading to insufficient utilization of local information. To address this issue, we propose the Multimodal Graph Attention Network (MGAN), which is based on visual graph neural networks. Our MGAN architecture includes a multi-level node information fusion module that utilizes different levels of object features to generate local information, compensate for the limitations of global information, and produce more expressive visual features. Additionally, we incorporate visual features into our model to guide the generation of text features, considering the correlation between text and objects in remote sensing images. We conduct extensive experiments on the RSITMD dataset, demonstrating that our method outperforms state-of-the-art methods by a margin of 2.27% in mR.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 50 条
  • [1] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [2] Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
    Yuan, Zhiqiang
    Zhang, Wenkai
    Tian, Changyuan
    Rong, Xuee
    Zhang, Zhengyuan
    Wang, Hongqi
    Fu, Kun
    Sun, Xian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
    Yang, Lei
    Feng, Yong
    Zhou, Mingling
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
  • [4] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [5] Hypersphere-Based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning
    Zhang, Weihang
    Li, Jihao
    Li, Shuoke
    Chen, Jialiang
    Zhang, Wenkai
    Gao, Xin
    Sun, Xian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [6] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zhang, Xiong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Long
    Zhang, Haisu
    [J]. REMOTE SENSING, 2023, 15 (18)
  • [7] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
    Zhang, Shun
    Li, Yupeng
    Mei, Shaohui
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
    Zheng, Juncheng
    Liang, Meiyu
    Yu, Yang
    Du, Junping
    Xue, Zhe
    [J]. 2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
  • [9] A Cross-Attention Mechanism Based on Regional-Level Semantic Features of Images for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zheng, Fuzhong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zhang, Xiong
    Zhang, Haisu
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [10] Improving text-image cross-modal retrieval with contrastive loss
    Chumeng Zhang
    Yue Yang
    Junbo Guo
    Guoqing Jin
    Dan Song
    An An Liu
    [J]. Multimedia Systems, 2023, 29 : 569 - 575