Towards a Multimodal Framework for Remote Sensing Image Change Retrieval and Captioning

被引:0
|
作者
Ferrod, Roger [1 ]
Di Caro, Luigi [1 ]
Ienco, Dino [2 ,3 ]
机构
[1] Univ Turin, Turin, Italy
[2] Univ Montpellier, INRAE, UMR TETIS, Montpellier, France
[3] Univ Montpellier, INRIA, Montpellier, France
来源
关键词
Remote Sensing; bi-temporal change detection; image captioning; text-image retrieval; contrastive learning;
D O I
10.1007/978-3-031-78980-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) data. Despite the numerous potential applications of RS data, including environmental protection, disaster monitoring and land planning, available solutions are predominantly focused on specific tasks like classification, captioning and retrieval. These solutions often overlook the unique characteristics of RS data, such as its capability to systematically provide information on the same geographical areas over time. This ability enables continuous monitoring of changes in the underlying landscape. To address this gap, we propose a novel foundation model for bitemporal RS image pairs, in the context of change detection analysis, leveraging Contrastive Learning and the LEVIR-CC dataset for both captioning and text-image retrieval. By jointly training a contrastive encoder and captioning decoder, our model add text-image retrieval capabilities, in the context of bi-temporal change detection, while maintaining captioning performances that are comparable to the state of the art. We release the source code and pretrained weights at: https://github. com/rogerferrod/RSICRC.
引用
收藏
页码:231 / 245
页数:15
相关论文
共 50 条
  • [41] An improved remote sensing image retrieval method based on bag of word framework
    Yang, Jin
    Liu, Jianbo
    Dai, Qin
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2014, 39 (09): : 1109 - 1113
  • [42] A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval
    Pan, Jiancheng
    Ma, Qing
    Bai, Cong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 611 - 620
  • [43] Novel Enhanced UNet for Change Detection Using Multimodal Remote Sensing Image
    Lv, Zhiyong
    Huang, Haitao
    Sun, Weiwei
    Lei, Tao
    Benediktsson, Jon Atli
    Li, Junhuai
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [44] Feature refinement and rethinking attention for remote sensing image captioning
    Li, Yunpeng
    Tao, Chengjin
    Liu, Meng
    Zhang, Xiangrong
    Wang, Guanchun
    Zhang, Tianyang
    Zhao, Dong
    Wang, Dabao
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [45] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [46] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
    Hoxha, Genc
    Melgani, Farid
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
  • [47] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [48] Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
    Chang, Shizhen
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6047 - 6060
  • [49] Truncation Cross Entropy Loss for Remote Sensing Image Captioning
    Li, Xuelong
    Zhang, Xueting
    Huang, Wei
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 5246 - 5257
  • [50] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005