Learning consensus-aware semantic knowledge for remote sensing image captioning

被引：6

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Tang, Xu ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 145卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;

D O I：

10.1016/j.patcog.2023.109893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.

引用

页数：12

共 50 条

[1] Vision-Enhanced and Consensus-Aware Transformer for Image Captioning
Cao, Shan
An, Gaoyun
Zheng, Zhenxing
Wang, Zhiyong
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7005 - 7018
[2] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
Li, Yunpeng
Zhang, Xiangrong
Gu, Jing
Li, Chen
Wang, Xin
Tang, Xu
Jiao, Licheng
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[3] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
Zhu, Yongshuo
Li, Lu
Chen, Keyan
Liu, Chenyang
Zhou, Fugen
Shi, Zhenwei
[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
[4] Meta captioning: A meta learning based remote sensing image captioning framework
Yang, Qiaoqiao
Ni, Zihao
Ren, Peng
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
[5] Semantic-Aware Dense Representation Learning for Remote Sensing Image Change Detection
Chen, Hao
Li, Wenyuan
Chen, Song
Shi, Zhenwei
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
Meng, Lingwu
Wang, Jing
Yang, Yang
Xiao, Liang
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
[7] Multi-label semantic feature fusion for remote sensing image captioning
Wang, Shuang
Ye, Xiutiao
Gu, Yu
Wang, Jihui
Meng, Yun
Tian, Jingxian
Hou, Biao
Jiao, Licheng
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
[8] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
Liu, Chenyang
Yang, Jiajun
Qi, Zipeng
Zou, Zhengxia
Shi, Zhenwei
[J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
[9] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
Wang, Qi
Yang, Zhigang
Ni, Weiping
Wu, Junzheng
Li, Qiang
[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62
[10] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
Li, Zhengxin
Zhao, Wenzhe
Du, Xuanyi
Zhou, Guangyao
Zhang, Songlin
[J]. REMOTE SENSING, 2024, 16 (01)

← 1 2 3 4 5 →