Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval

被引:0
|
作者
Zhu, Lilu [1 ]
Wang, Yang [1 ]
Hu, Yanfeng [2 ]
Su, Xiaolu
Fu, Kun [2 ]
机构
[1] Suzhou Aerosp Informat Res Inst, Suzhou 215123, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
关键词
Remote sensing; Optical sensors; Optical imaging; Feature extraction; Semantics; Visualization; Image retrieval; Content-based remote sensing image retrieval (CBRSIR); correlation-aware retrieval; cross-modal contrastive learning; hash index code; hierarchical semantic tree; BENCHMARK; DATASET;
D O I
10.1109/TGRS.2024.3417421
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] TOWARDS SKETCH-BASED IMAGE RETRIEVAL WITH DEEP CROSS-MODAL CORRELATION LEARNING
    Huang, Fei
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 907 - 912
  • [42] Cross-Modal feature description for remote sensing image matching
    Li, Liangzhi
    Liu, Ming
    Ma, Lingfei
    Han, Ling
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 112
  • [43] CMIR-NET : A deep learning based model for cross-modal retrieval in remote sensing
    Chaudhuri, Ushasi
    Banerjee, Biplab
    Bhattacharya, Avik
    Datcu, Mihai
    PATTERN RECOGNITION LETTERS, 2020, 131 : 456 - 462
  • [44] Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels
    Xu, Tianyuan
    Liu, Xueliang
    Huang, Zhen
    Guo, Dan
    Hong, Richang
    Wang, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [45] Integrating Multisubspace Joint Learning With Multilevel Guidance for Cross-Modal Retrieval of Remote Sensing Images
    Chen, Yaxiong
    Huang, Jirui
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 17
  • [46] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [47] Category-Level Contrastive Learning for Unsupervised Hashing in Cross-Modal Retrieval
    Xu, Mengying
    Luo, Linyin
    Lai, Hanjiang
    Yin, Jian
    DATA SCIENCE AND ENGINEERING, 2024, 9 (03) : 251 - 263
  • [48] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [49] Self-Supervised Correlation Learning for Cross-Modal Retrieval
    Liu, Yaxin
    Wu, Jianlong
    Qu, Leigang
    Gan, Tian
    Yin, Jianhua
    Nie, Liqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2851 - 2863
  • [50] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255