MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval

被引:0
|
作者
Wang, Zhiming [1 ]
Dong, Zhihua [1 ]
Yang, Xiaoyu [1 ]
Wang, Zhiguo [1 ]
Yin, Guangqiang [1 ,2 ,3 ,4 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611730, Peoples R China
[2] UESTC, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Kashi Inst Elect & Informat Ind, Kashi 844199, Peoples R China
[4] Univ Elect Sci & Technol China, Kashi 844199, Peoples R China
关键词
Multimodal graph attention network (MGAN); Cross-modal remote sensing (RS) text-image retrieval; Visual graph neural network;
D O I
10.1007/978-981-99-9243-0_27
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-modal text image retrieval in remote sensing is a crucial task that requires the development of unified visual and textual representations. Previous research has primarily focused on global information or object features extracted through object detection algorithms to obtain local information. However, these studies have overlooked the complexity of remote sensing images, leading to insufficient utilization of local information. To address this issue, we propose the Multimodal Graph Attention Network (MGAN), which is based on visual graph neural networks. Our MGAN architecture includes a multi-level node information fusion module that utilizes different levels of object features to generate local information, compensate for the limitations of global information, and produce more expressive visual features. Additionally, we incorporate visual features into our model to guide the generation of text features, considering the correlation between text and objects in remote sensing images. We conduct extensive experiments on the RSITMD dataset, demonstrating that our method outperforms state-of-the-art methods by a margin of 2.27% in mR.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 50 条
  • [31] Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval
    Tang, Xu
    Wang, Yijing
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [32] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [33] Cross-Modal Information Interaction Reasoning Network for Image and Text Retrieval
    Wei, Yuqi
    Li, Ning
    Computer Engineering and Applications, 2023, 59 (16) : 115 - 124
  • [34] Adaptive Graph Attention Hashing for Unsupervised Cross-Modal Retrieval via Multimodal Transformers
    Li, Yewen
    Ge, Mingyuan
    Ji, Yucheng
    Li, Mingyong
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 1 - 15
  • [35] GRAPH PATTERN LOSS BASED DIVERSIFIED ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
    Chen, Xueying
    Zhang, Rong
    Zhan, Yibing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2391 - 2395
  • [36] Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing
    Zhang, Yichao
    Zheng, Xiangtao
    Lu, Xiaoqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9327 - 9338
  • [37] Cross-modal semantic aligning and neighbor-aware completing for robust text-image person retrieval
    Gong, Tiantian
    Wang, Junsheng
    Zhang, Liyan
    INFORMATION FUSION, 2024, 112
  • [38] Deep Cross-Modal ImageVoice Retrieval in Remote Sensing
    Chen, Yaxiong
    Lu, Xiaoqiang
    Wang, Shuai
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (10): : 7049 - 7061
  • [39] Cross-modal feature learning and alignment network for text-image person re-identification
    Huang, Bailiang
    Qi, Xiaolong
    Chen, Bin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [40] Cross-Modal Coherence for Text-to-Image Retrieval
    Alikhani, Malihe
    Han, Fangda
    Ravi, Hareesh
    Kapadia, Mubbasir
    Pavlovic, Vladimir
    Stone, Matthew
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10427 - 10435