Learning consensus-aware semantic knowledge for remote sensing image captioning

被引：6

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Tang, Xu ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 145卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;

D O I：

10.1016/j.patcog.2023.109893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.

引用

下载

页数：12

共 50 条

[21] Event-Aware Retrospective Learning for Knowledge-Based Image Captioning
Liu, An-An
Zhai, Yingchen
Xu, Ning
Tian, Hongshuo
Nie, Weizhi
Zhang, Yongdong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4898 - 4911
[22] Self-Learning for Few-Shot Remote Sensing Image Captioning
Zhou, Haonan
Du, Xiaoping
Xia, Lurui
Li, Sen
REMOTE SENSING, 2022, 14 (18)
[23] Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion
Zhao, An
Yang, Wenzhong
Chen, Danny
Wei, Fuyuan
ELECTRONICS, 2024, 13 (18)
[24] Incorporating object counts into remote sensing image captioning
Ni, Zihao
Zong, Zhaoyun
Ren, Peng
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
[25] Intensive Positioning Network for Remote Sensing Image Captioning
Wang, Shengsheng
Chen, Jiawei
Wang, Guangyao
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 567 - 576
[26] Multiscale Multiinteraction Network for Remote Sensing Image Captioning
Wang, Yong
Zhang, Wenkai
Zhang, Zhengyuan
Gao, Xin
Sun, Xian
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2154 - 2165
[27] Structural Representative Network for Remote Sensing Image Captioning
Sharma, Jaya
Divya, Peketi
Sravani, Yenduri
Shekar, B. H.
Mohan, Krishna C.
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
[28] Exploring region features in remote sensing image captioning
Zhao, Kai
Xiong, Wei
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 127
[29] Cooperative Connection Transformer for Remote Sensing Image Captioning
Zhao, Kai
Xiong, Wei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[30] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
Wang, Qi
Huang, Wei
Zhang, Xueting
Li, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922

← 1 2 3 4 5 →