Learning consensus-aware semantic knowledge for remote sensing image captioning

被引：6

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Tang, Xu ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 145卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;

D O I：

10.1016/j.patcog.2023.109893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.

引用

页数：12

共 50 条

[31] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
Cao, Yukun
Yan, Jialuo
Tang, Yijia
He, Zhenyi
Xu, Kangle
Cheng, Yu
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
[32] Contrastive semantic similarity learning for image captioning evaluation
Zeng, Chao
Kwong, Sam
Zhao, Tiesong
Wang, Hanli
INFORMATION SCIENCES, 2022, 609 : 913 - 930
[33] Structural Semantic Adversarial Active Learning for Image Captioning
Zhang, Beichen
Li, Liang
Su, Li
Wang, Shuhui
Deng, Jincan
Zha, Zheng-Jun
Huang, Qingming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
[34] Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy
Wu, Jie
Hu, Haifeng
Wu, Yi
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)
[35] Rotation-aware representation learning for remote sensing image retrieval
Wu, Zhi-Ze
Zou, Chang
Wang, Yan
Tan, Ming
Weise, Thomas
INFORMATION SCIENCES, 2021, 572 : 404 - 423
[36] Summarize and Search: Learning Consensus-aware Dynamic Convolution for Co-Saliency Detection
Zhang, Ni
Han, Junwei
Liu, Nian
Shao, Ling
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4147 - 4156
[37] Multi-modal graph context extraction and consensus-aware learning for emotion in conversation
Dai, Yijing
Li, Jinxing
Li, Yingjian
Lu, Guangming
KNOWLEDGE-BASED SYSTEMS, 2024, 298
[38] Remote Sensing Image Scene Classification by Multiple Granularity Semantic Learning
Guo, Weilong
Li, Shengyang
Yang, Jian
Zhou, Zhuang
Liu, Yunfei
Lu, Junjie
Kou, Longxuan
Zhao, Manqi
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2546 - 2562
[39] SEMANTIC DECOUPLED REPRESENTATION LEARNING FOR REMOTE SENSING IMAGE CHANGE DETECTION
Chen, Hao
Zao, Yifan
Liu, Liqin
Chen, Song
Shi, Zhenwei
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1051 - 1054
[40] Dual-Path Feature Aware Network for Remote Sensing Image Semantic Segmentation
Geng, Jie
Song, Shuai
Jiang, Wen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3674 - 3686

← 1 2 3 4 5 →