Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval

被引：0

作者：

Zhu, Lilu ^{[1
]}

Wang, Yang ^{[1
]}

Hu, Yanfeng ^{[2
]}

Su, Xiaolu

Fu, Kun ^{[2
]}

机构：

[1] Suzhou Aerosp Informat Res Inst, Suzhou 215123, Peoples R China

[2] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

关键词：

Remote sensing; Optical sensors; Optical imaging; Feature extraction; Semantics; Visualization; Image retrieval; Content-based remote sensing image retrieval (CBRSIR); correlation-aware retrieval; cross-modal contrastive learning; hash index code; hierarchical semantic tree; BENCHMARK; DATASET;

D O I：

10.1109/TGRS.2024.3417421

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.

引用

下载

页数：21

共 50 条

[21] Structure-aware contrastive hashing for unsupervised cross-modal retrieval
Cui, Jinrong
He, Zhipeng
Huang, Qiong
Fu, Yulu
Li, Yuting
Wen, Jie
NEURAL NETWORKS, 2024, 174
[22] Deep Cross-Modal ImageVoice Retrieval in Remote Sensing
Chen, Yaxiong
Lu, Xiaoqiang
Wang, Shuai
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (10): : 7049 - 7061
[23] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
Han, De
Cheng, Xing
Guo, Nan
Ye, Xiaochun
Rainer, Benjamin
Priller, Peter
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
[24] A NOVEL SELF-SUPERVISED CROSS-MODAL IMAGE RETRIEVAL METHOD IN REMOTE SENSING
Sumbul, Gencer
Mueller, Markus
Demir, Beguem
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2426 - 2430
[25] Improving text-image cross-modal retrieval with contrastive loss
Zhang, Chumeng
Yang, Yue
Guo, Junbo
Jin, Guoqing
Song, Dan
Liu, An An
MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
[26] Improving text-image cross-modal retrieval with contrastive loss
Chumeng Zhang
Yue Yang
Junbo Guo
Guoqing Jin
Dan Song
An An Liu
Multimedia Systems, 2023, 29 : 569 - 575
[27] Image-Text Cross-Modal Retrieval with Instance Contrastive Embedding
Zeng, Ruigeng
Ma, Wentao
Wu, Xiaoqian
Liu, Wei
Liu, Jie
ELECTRONICS, 2024, 13 (02)
[28] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
Zhang, Shun
Li, Yupeng
Mei, Shaohui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[29] Hypersphere-Based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning
Zhang, Weihang
Li, Jihao
Li, Shuoke
Chen, Jialiang
Zhang, Wenkai
Gao, Xin
Sun, Xian
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[30] Cross-Modal Contrastive Learning for Text-to-Image Generation
Zhang, Han
Koh, Jing Yu
Baldridge, Jason
Lee, Honglak
Yang, Yinfei
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 833 - 842

← 1 2 3 4 5 →