Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval

被引:15
|
作者
Al Rahhal, Mohamad M. [1 ]
Bazi, Yakoub [2 ]
Alsharif, Norah A. [2 ]
Bashmal, Laila [2 ]
Alajlan, Naif [2 ]
Melgani, Farid [3 ]
机构
[1] King Saud Univ, Coll Appl Comp Sci, Appl Comp Sci Dept, Riyadh 4545, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Engn, Riyadh 4545, Saudi Arabia
[3] Univ Trento, Dept Informat Engn & Comp Sci, I-38123 Trento, Italy
关键词
Feature extraction; Transformers; Visualization; Task analysis; Image retrieval; Semantics; Optical filters; Contrastive loss; cross-modal retrieval; language transformer; remote sensing; vision transformer; NETWORK;
D O I
10.1109/JSTARS.2022.3215803
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Cross-modal text-image retrieval in remote sensing (RS) provides a flexible retrieval experience for mining useful information from RS repositories. However, existing methods are designed to accept queries formulated in the English language only, which may restrict accessibility to useful information for non-English speakers. Allowing multilanguage queries can enhance the communication with the retrieval system and broaden access to the RS information. To address this limitation, this article proposes a multilanguage framework based on transformers. Specifically, our framework is composed of two transformer encoders for learning modality-specific representations, the first is a language encoder for generating language representation features from the textual description, while the second is a vision encoder for extracting visual features from the corresponding image. The two encoders are trained jointly on image and text pairs by minimizing a bidirectional contrastive loss. To enable the model to understand queries in multiple languages, we trained it on descriptions from four different languages, namely, English, Arabic, French, and Italian. The experimental results on three benchmark datasets (i.e., RSITMD, RSICD, and UCM) demonstrate that the proposed model improves significantly the retrieval performances in terms of recall compared to the existing state-of-the-art RS retrieval methods.
引用
收藏
页码:9115 / 9126
页数:12
相关论文
共 50 条
  • [1] Spatial-Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval
    Wu, Dongqing
    Li, Huihui
    Hou, Yinxuan
    Xu, Cuili
    Cheng, Gong
    Guo, Lei
    Liu, Hang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [2] Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval
    Tang, Xu
    Wang, Yijing
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [3] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [4] Unsupervised Transformer Balanced Hashing for Multispectral Remote Sensing Image Retrieval
    Chen, Yaxiong
    Wang, Fan
    Lu, Lin
    Xiong, Shengwu
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7089 - 7099
  • [5] Improved color texture descriptors for remote sensing image retrieval
    Shao, Zhenfeng
    Zhou, Weixun
    Zhang, Lei
    Hou, Jihu
    [J]. JOURNAL OF APPLIED REMOTE SENSING, 2014, 8
  • [6] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [7] Remote Sensing Image Fusion Method Based on Improved Swin Transformer
    Li Zitong
    Zhao Jiankang
    Xu Jingran
    Long Haihui
    Liu Chuanqi
    [J]. ACTA PHOTONICA SINICA, 2023, 52 (11)
  • [8] A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval
    Pan, Jiancheng
    Ma, Qing
    Bai, Cong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 611 - 620
  • [9] Knowledge-Aware Text-Image Retrieval for Remote Sensing Images
    Mi, Li
    Dai, Xianjie
    Castillo-Navarro, Javiera
    Tuia, Devis
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [10] An improved SVM model for relevance feedback in remote sensing image retrieval
    Ma, Caihong
    Dai, Qin
    Liu, Jianbo
    Liu, Shibin
    Yang, Jin
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2014, 7 (09) : 725 - 745