Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval

被引:0
|
作者
Zhu, Lilu [1 ]
Wang, Yang [1 ]
Hu, Yanfeng [2 ]
Su, Xiaolu
Fu, Kun [2 ]
机构
[1] Suzhou Aerosp Informat Res Inst, Suzhou 215123, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
关键词
Remote sensing; Optical sensors; Optical imaging; Feature extraction; Semantics; Visualization; Image retrieval; Content-based remote sensing image retrieval (CBRSIR); correlation-aware retrieval; cross-modal contrastive learning; hash index code; hierarchical semantic tree; BENCHMARK; DATASET;
D O I
10.1109/TGRS.2024.3417421
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Cross-Modal Contrastive Learning for Remote Sensing Image Classification
    Feng, Zhixi
    Song, Liangliang
    Yang, Shuyuan
    Zhang, Xinyu
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [2] UNSUPERVISED CONTRASTIVE HASHING FOR CROSS-MODAL RETRIEVAL IN REMOTE SENSING
    Mikriukov, Georgii
    Ravanbakhsh, Mahdyar
    Demir, Begum
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4463 - 4467
  • [3] Cross-Modal Remote Sensing Image-Audio Retrieval With Adaptive Learning for Aligning Correlation
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [4] Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval
    Lv, Yafei
    Xiong, Wei
    Zhang, Xiaohan
    Cui, Yaqi
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [5] A fusion-based contrastive learning model for cross-modal remote sensing retrieval
    Li, Haoran
    Xiong, Wei
    Cui, Yaqi
    Xiong, Zhenyu
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (09) : 3359 - 3386
  • [6] A Cross-modal image retrieval method based on contrastive learning
    Zhou, Wen
    [J]. JOURNAL OF OPTICS-INDIA, 2023, 53 (3): : 2098 - 2107
  • [7] Query Aware Dual Contrastive Learning Network for Cross-modal Retrieval
    Yin M.-R.
    Liang M.-Y.
    Yu Y.
    Cao X.-W.
    Du J.-P.
    Xue Z.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2024, 35 (05): : 2120 - 2132
  • [8] Masking-Based Cross-Modal Remote Sensing Image-Text Retrieval via Dynamic Contrastive Learning
    Zhao, Zuopeng
    Miao, Xiaoran
    He, Chen
    Hu, Jianfeng
    Min, Bingbing
    Gao, Yumeng
    Liu, Ying
    Pharksuwan, Kanyaphakphachsorn
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [9] Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval
    Yuan, Zhiqiang
    Zhang, Wenkai
    Fu, Kun
    Li, Xuan
    Deng, Chubo
    Wang, Hongqi
    Sun, Xian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [10] Deep Cross-Modal Retrieval for Remote Sensing Image and Audio
    Guo Mao
    Yuan Yuan
    Lu Xiaoqiang
    [J]. 2018 10TH IAPR WORKSHOP ON PATTERN RECOGNITION IN REMOTE SENSING (PRRS), 2018,