Learning consensus-aware semantic knowledge for remote sensing image captioning

被引:6
|
作者
Li, Yunpeng [1 ]
Zhang, Xiangrong [1 ]
Cheng, Xina [1 ]
Tang, Xu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;
D O I
10.1016/j.patcog.2023.109893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [41] FALSE: False Negative Samples Aware Contrastive Learning for Semantic Segmentation of High-Resolution Remote Sensing Image
    Zhang, Zhaoyang
    Wang, Xuying
    Mei, Xiaoming
    Tao, Chao
    Li, Haifeng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [42] Knowledge-Aware Text-Image Retrieval for Remote Sensing Images
    Mi, Li
    Dai, Xianjie
    Castillo-Navarro, Javiera
    Tuia, Devis
    IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [43] Geographic knowledge graph-guided remote sensing image semantic segmentation
    Li Y.
    Wu K.
    Ouyang S.
    Yang K.
    Li H.
    Zhang Y.
    National Remote Sensing Bulletin, 2024, 28 (02) : 455 - 469
  • [44] Feature refinement and rethinking attention for remote sensing image captioning
    Yunpeng Li
    Chengjin Tao
    Meng Liu
    Xiangrong Zhang
    Guanchun Wang
    Tianyang Zhang
    Dong Zhao
    Dabao Wang
    Scientific Reports, 15 (1)
  • [45] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [46] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [47] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
    Hoxha, Genc
    Melgani, Farid
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
  • [48] Sound Active Attention Framework for Remote Sensing Image Captioning
    Lu, Xiaoqiang
    Wang, Binqiang
    Zheng, Xiangtao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1985 - 2000
  • [49] Truncation Cross Entropy Loss for Remote Sensing Image Captioning
    Li, Xuelong
    Zhang, Xueting
    Huang, Wei
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 5246 - 5257
  • [50] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005