A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval

被引:1
|
作者
Pan, Jiancheng [1 ]
Ma, Qing [1 ]
Bai, Cong [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou, Peoples R China
关键词
Image-Text Retrieval; Remote Sensing; Prior Instruction; NEURAL-NETWORK;
D O I
10.1145/3581783.3612374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a prior instruction representation framework (PIR) for remote sensing image-text retrieval, aimed at remote sensing vision-language understanding tasks to solve the semantic noise problem. Our highlight is the proposal of a paradigm that draws on prior knowledge to instruct adaptive learning of vision and text representations. Concretely, two progressive attention encoder (PAE) structures, Spatial-PAE and Temporal-PAE, are proposed to perform long-range dependency modeling to enhance key feature representation. In vision representation, Vision Instruction Representation (VIR) based on Spatial-PAE exploits the prior-guided knowledge of the remote sensing scene recognition by building a belief matrix to select key features for reducing the impact of semantic noise. In text representation, Language Cycle Attention (LCA) based on Temporal-PAE uses the previous time step to cyclically activate the current time step to enhance text representation capability. A cluster-wise affiliation loss is proposed to constrain the inter-classes and to reduce the semantic confusion zones in the common subspace. Comprehensive experiments demonstrate that using prior knowledge instruction could enhance vision and text representations and could outperform the state-of-the-art methods on two benchmark datasets, RSICD and RSITMD. Codes are available at https://github.com/Zjut-MultimediaPlus/PIR-pytorch.
引用
收藏
页码:611 / 620
页数:10
相关论文
共 50 条
  • [1] Remote sensing image-text retrieval based on layout semantic joint representation
    Zhang R.
    Nie J.
    Song N.
    Zheng C.
    Wei Z.
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 671 - 683
  • [2] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [3] Prior-Experience-Based Vision-Language Model for Remote Sensing Image-Text Retrieval
    Tang, Xu
    Huang, Dabiao
    Ma, Jingjing
    Zhang, Xiangrong
    Liu, Fang
    Jiao, Licheng
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [4] Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image-Text Retrieval
    Yang, Rui
    Wang, Shuang
    Han, Yingping
    Li, Yuanheng
    Zhao, Dong
    Quan, Dou
    Guo, Yanhe
    Jiao, Licheng
    Yang, Zhi
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [5] Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
    Yuan, Yuan
    Zhan, Yang
    Xiong, Zhitong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [6] Multiscale Salient Alignment Learning for Remote-Sensing Image-Text Retrieval
    Chen, Yaxiong
    Huang, Jinghao
    Li, Xiaoyu
    Xiong, Shengwu
    Lu, Xiaoqiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
  • [7] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
    Shui, Jianan
    Ding, Shuaipeng
    Li, Mingyong
    Ma, Yan
    [J]. WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145
  • [8] Remote Sensing Image-Text Retrieval With Implicit-Explicit Relation Reasoning
    Yang, Lingling
    Zhou, Tongqing
    Ma, Wentao
    Du, Mengze
    Liu, Lu
    Li, Feng
    Zhao, Shan
    Wang, Yuwei
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [9] Joint Image-text Representation Learning for Fashion Retrieval
    Yan, Cairong
    Li, Yu
    Wan, Yongquan
    Zhang, Zhaohui
    [J]. ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 412 - 417
  • [10] Visual Global-Salient-Guided Network for Remote Sensing Image-Text Retrieval
    He, Yangpeng
    Xu, Xin
    Chen, Hongjia
    Li, Jinwen
    Pu, Fangling
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62