Learning consensus-aware semantic knowledge for remote sensing image captioning

被引:6
|
作者
Li, Yunpeng [1 ]
Zhang, Xiangrong [1 ]
Cheng, Xina [1 ]
Tang, Xu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;
D O I
10.1016/j.patcog.2023.109893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Vision-Enhanced and Consensus-Aware Transformer for Image Captioning
    Cao, Shan
    An, Gaoyun
    Zheng, Zhenxing
    Wang, Zhiyong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7005 - 7018
  • [2] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
    Zhu, Yongshuo
    Li, Lu
    Chen, Keyan
    Liu, Chenyang
    Zhou, Fugen
    Shi, Zhenwei
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [4] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
  • [5] Semantic-Aware Dense Representation Learning for Remote Sensing Image Change Detection
    Chen, Hao
    Li, Wenyuan
    Chen, Song
    Shi, Zhenwei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [7] Multi-label semantic feature fusion for remote sensing image captioning
    Wang, Shuang
    Ye, Xiutiao
    Gu, Yu
    Wang, Jihui
    Meng, Yun
    Tian, Jingxian
    Hou, Biao
    Jiao, Licheng
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
  • [8] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
    Liu, Chenyang
    Yang, Jiajun
    Qi, Zipeng
    Zou, Zhengxia
    Shi, Zhenwei
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
  • [9] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
    Wang, Qi
    Yang, Zhigang
    Ni, Weiping
    Wu, Junzheng
    Li, Qiang
    [J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
  • [10] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    [J]. REMOTE SENSING, 2024, 16 (01)