TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning

被引:3
|
作者
Guo, Jie [1 ]
Li, Ze [1 ]
Song, Bin [1 ]
Chi, Yuhao [1 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710071, Peoples R China
关键词
attention mechanism; fine-grained feature; two-stage enhancement; remote sensing image captioning; feature interaction decoder; FUSION;
D O I
10.3390/rs16111843
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In the field of remote sensing image captioning (RSIC), mainstream methods typically adopt an encoder-decoder framework. Methods based on this framework often use only simple feature fusion strategies, failing to fully mine the fine-grained features of the remote sensing image. Moreover, the lack of context information introduction in the decoder results in less accurate generated sentences. To address these problems, we propose a two-stage feature enhancement model (TSFE) for remote sensing image captioning. In the first stage, we adopt an adaptive feature fusion strategy to acquire multi-scale features. In the second stage, we further mine fine-grained features based on multi-scale features by establishing associations between different regions of the image. In addition, we introduce global features with scene information in the decoder to help generate descriptions. Experimental results on the RSICD, UCM-Captions, and Sydney-Captions datasets demonstrate that the proposed method outperforms existing state-of-the-art approaches.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] FEST: Feature Enhancement Swin Transformer for Remote Sensing Image Semantic Segmentation
    Zhang, Ronghuan
    Zhao, Jing
    Li, Ming
    Zou, Qingzhi
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1177 - 1182
  • [42] Remote Sensing Image Scene Classification Based on Multidimensional Attention and Feature Enhancement
    Liu, Chengrui
    Dai, Hong
    Wang, Shuang
    Chen, Junhong
    IAENG International Journal of Computer Science, 2023, 50 (04)
  • [43] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [44] REMOTE SENSING IMAGE CAPTIONING WITH SVM-BASED DECODING
    Hoxha, Genc
    Melgani, Farid
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 6734 - 6737
  • [45] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [46] Truncation Cross Entropy Loss for Remote Sensing Image Captioning
    Li, Xuelong
    Zhang, Xueting
    Huang, Wei
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (06): : 5246 - 5257
  • [47] Sound Active Attention Framework for Remote Sensing Image Captioning
    Lu, Xiaoqiang
    Wang, Binqiang
    Zheng, Xiangtao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1985 - 2000
  • [48] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005
  • [49] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [50] A Two-Stage Deep Learning Registration Method for Remote Sensing Images Based on Sub-Image Matching
    Chen, Yuan
    Jiang, Jie
    REMOTE SENSING, 2021, 13 (17)