A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval

被引:1
|
作者
Pan, Jiancheng [1 ]
Ma, Qing [1 ]
Bai, Cong [1 ]
机构
[1] Zhejiang Univ Technol, Hangzhou, Peoples R China
关键词
Image-Text Retrieval; Remote Sensing; Prior Instruction; NEURAL-NETWORK;
D O I
10.1145/3581783.3612374
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a prior instruction representation framework (PIR) for remote sensing image-text retrieval, aimed at remote sensing vision-language understanding tasks to solve the semantic noise problem. Our highlight is the proposal of a paradigm that draws on prior knowledge to instruct adaptive learning of vision and text representations. Concretely, two progressive attention encoder (PAE) structures, Spatial-PAE and Temporal-PAE, are proposed to perform long-range dependency modeling to enhance key feature representation. In vision representation, Vision Instruction Representation (VIR) based on Spatial-PAE exploits the prior-guided knowledge of the remote sensing scene recognition by building a belief matrix to select key features for reducing the impact of semantic noise. In text representation, Language Cycle Attention (LCA) based on Temporal-PAE uses the previous time step to cyclically activate the current time step to enhance text representation capability. A cluster-wise affiliation loss is proposed to constrain the inter-classes and to reduce the semantic confusion zones in the common subspace. Comprehensive experiments demonstrate that using prior knowledge instruction could enhance vision and text representations and could outperform the state-of-the-art methods on two benchmark datasets, RSICD and RSITMD. Codes are available at https://github.com/Zjut-MultimediaPlus/PIR-pytorch.
引用
收藏
页码:611 / 620
页数:10
相关论文
共 50 条
  • [31] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [32] Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval
    Al Rahhal, Mohamad M.
    Bazi, Yakoub
    Alsharif, Norah A.
    Bashmal, Laila
    Alajlan, Naif
    Melgani, Farid
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9115 - 9126
  • [33] Global-Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image-Text Retrieval
    Hu, Gang
    Wen, Zaidao
    Lv, Yafei
    Zhang, Jianting
    Wu, Qian
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [34] Dynamic Modality Interaction Modeling for Image-Text Retrieval
    Qu, Leigang
    Liu, Meng
    Wu, Jianlong
    Gao, Zan
    Nie, Liqiang
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1104 - 1113
  • [35] External Knowledge Dynamic Modeling for Image-text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Liu, Min
    Li, Xuanya
    Liu, Anan
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5330 - 5338
  • [36] Asymmetric bi-encoder for image-text retrieval
    Xiong, Wei
    Liu, Haoliang
    Mi, Siya
    Zhang, Yu
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
  • [37] Multiview adaptive attention pooling for image-text retrieval
    Ding, Yunlai
    Yu, Jiaao
    Lv, Qingxuan
    Zhao, Haoran
    Dong, Junyu
    Li, Yuezun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [38] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860
  • [39] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [40] Causal image-text retrieval embedded with consensus knowledge
    Liang Y.
    Liu X.
    Ma Z.
    Li Z.
    [J]. Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2024, 46 (02): : 317 - 328