Learning consensus-aware semantic knowledge for remote sensing image captioning

被引:6
|
作者
Li, Yunpeng [1 ]
Zhang, Xiangrong [1 ]
Cheng, Xina [1 ]
Tang, Xu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;
D O I
10.1016/j.patcog.2023.109893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [32] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930
  • [33] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
  • [34] Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy
    Wu, Jie
    Hu, Haifeng
    Wu, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)
  • [35] Rotation-aware representation learning for remote sensing image retrieval
    Wu, Zhi-Ze
    Zou, Chang
    Wang, Yan
    Tan, Ming
    Weise, Thomas
    INFORMATION SCIENCES, 2021, 572 : 404 - 423
  • [36] Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
    Zhang, Ni
    Han, Junwei
    Liu, Nian
    Shao, Ling
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4147 - 4156
  • [37] Multi-modal graph context extraction and consensus-aware learning for emotion in conversation
    Dai, Yijing
    Li, Jinxing
    Li, Yingjian
    Lu, Guangming
    KNOWLEDGE-BASED SYSTEMS, 2024, 298
  • [38] Remote Sensing Image Scene Classification by Multiple Granularity Semantic Learning
    Guo, Weilong
    Li, Shengyang
    Yang, Jian
    Zhou, Zhuang
    Liu, Yunfei
    Lu, Junjie
    Kou, Longxuan
    Zhao, Manqi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2546 - 2562
  • [39] SEMANTIC DECOUPLED REPRESENTATION LEARNING FOR REMOTE SENSING IMAGE CHANGE DETECTION
    Chen, Hao
    Zao, Yifan
    Liu, Liqin
    Chen, Song
    Shi, Zhenwei
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1051 - 1054
  • [40] Dual-Path Feature Aware Network for Remote Sensing Image Semantic Segmentation
    Geng, Jie
    Song, Shuai
    Jiang, Wen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3674 - 3686