TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning

被引：3

作者：

Guo, Jie ^{[1
]}

Li, Ze ^{[1
]}

Song, Bin ^{[1
]}

Chi, Yuhao ^{[1
]}

机构：

[1] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China

来源：

REMOTE SENSING | 2024年 / 16卷 / 11期

关键词：

attention mechanism; fine-grained feature; two-stage enhancement; remote sensing image captioning; feature interaction decoder; FUSION;

D O I：

10.3390/rs16111843

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

In the field of remote sensing image captioning (RSIC), mainstream methods typically adopt an encoder-decoder framework. Methods based on this framework often use only simple feature fusion strategies, failing to fully mine the fine-grained features of the remote sensing image. Moreover, the lack of context information introduction in the decoder results in less accurate generated sentences. To address these problems, we propose a two-stage feature enhancement model (TSFE) for remote sensing image captioning. In the first stage, we adopt an adaptive feature fusion strategy to acquire multi-scale features. In the second stage, we further mine fine-grained features based on multi-scale features by establishing associations between different regions of the image. In addition, we introduce global features with scene information in the decoder to help generate descriptions. Experimental results on the RSICD, UCM-Captions, and Sydney-Captions datasets demonstrate that the proposed method outperforms existing state-of-the-art approaches.

引用

页数：19

共 50 条

[41] FEST: Feature Enhancement Swin Transformer for Remote Sensing Image Semantic Segmentation
Zhang, Ronghuan
Zhao, Jing
Li, Ming
Zou, Qingzhi
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1177 - 1182
[42] Remote Sensing Image Scene Classification Based on Multidimensional Attention and Feature Enhancement
Liu, Chengrui
Dai, Hong
Wang, Shuang
Chen, Junhong
IAENG International Journal of Computer Science, 2023, 50 (04)
[43] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
Kandala, Hitesh
Saha, Sudipan
Banerjee, Biplab
Zhu, Xiao Xiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[44] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
Hoxha, Genc
Melgani, Farid
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
[45] Region-guided transformer for remote sensing image captioning
Zhao, Kai
Xiong, Wei
INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
[46] Truncation Cross Entropy Loss for Remote Sensing Image Captioning
Li, Xuelong
Zhang, Xueting
Huang, Wei
Wang, Qi
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 5246 - 5257
[47] Sound Active Attention Framework for Remote Sensing Image Captioning
Lu, Xiaoqiang
Wang, Binqiang
Zheng, Xiangtao
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1985 - 2000
[48] Multiscale Methods for Optical Remote-Sensing Image Captioning
Ma, Xiaofeng
Zhao, Rui
Shi, Zhenwei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005
[49] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
Li, Yunpeng
Zhang, Xiangrong
Gu, Jing
Li, Chen
Wang, Xin
Tang, Xu
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[50] A Two-Stage Deep Learning Registration Method for Remote Sensing Images Based on Sub-Image Matching
Chen, Yuan
Jiang, Jie
REMOTE SENSING, 2021, 13 (17)

← 1 2 3 4 5 →