Learning consensus-aware semantic knowledge for remote sensing image captioning

被引:6
|
作者
Li, Yunpeng [1 ]
Zhang, Xiangrong [1 ]
Cheng, Xina [1 ]
Tang, Xu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;
D O I
10.1016/j.patcog.2023.109893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [21] Event-Aware Retrospective Learning for Knowledge-Based Image Captioning
    Liu, An-An
    Zhai, Yingchen
    Xu, Ning
    Tian, Hongshuo
    Nie, Weizhi
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4898 - 4911
  • [22] Self-Learning for Few-Shot Remote Sensing Image Captioning
    Zhou, Haonan
    Du, Xiaoping
    Xia, Lurui
    Li, Sen
    REMOTE SENSING, 2022, 14 (18)
  • [23] Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion
    Zhao, An
    Yang, Wenzhong
    Chen, Danny
    Wei, Fuyuan
    ELECTRONICS, 2024, 13 (18)
  • [24] Incorporating object counts into remote sensing image captioning
    Ni, Zihao
    Zong, Zhaoyun
    Ren, Peng
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [25] Intensive Positioning Network for Remote Sensing Image Captioning
    Wang, Shengsheng
    Chen, Jiawei
    Wang, Guangyao
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 567 - 576
  • [26] Multiscale Multiinteraction Network for Remote Sensing Image Captioning
    Wang, Yong
    Zhang, Wenkai
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2154 - 2165
  • [27] Structural Representative Network for Remote Sensing Image Captioning
    Sharma, Jaya
    Divya, Peketi
    Sravani, Yenduri
    Shekar, B. H.
    Mohan, Krishna C.
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [28] Exploring region features in remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 127
  • [29] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [30] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922