Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer

被引:15
|
作者
Zhuang, Shuo [1 ,2 ]
Wang, Ping [3 ]
Wang, Gang [2 ,3 ]
Wang, Di [3 ]
Chen, Jinyong [2 ]
Gao, Feng [2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Peoples R China
[2] CETC Key Lab Aerosp Informat Applicat, Shijiazhuang 050081, Hebei, Peoples R China
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Transformers; Decoding; Visualization; Training; Measurement; Semantics; Convolutional neural network (CNN); image captioning; remote sensing; transformer; MODELS;
D O I
10.1109/LGRS.2021.3135711
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing image captioning (RSIC) has great significance in image understanding, which describes the image content in natural language. Existing methods are mainly based on deep learning and rely on the encoder-decoder model to generate sentences. In the decoding process, recurrent neural network (RNN) and long short-term memory (LSTM) are normally applied to sequentially generate image captions. In this letter, the transformer encoder-decoder is combined with grid features to improve the RSIC performance. First, the pretrained convolutional neural network (CNN) is used to extract grid-based visual features, which are encoded as vectorial representations. Then, the transformer outputs semantic descriptions to bridge visual features and natural language. Besides, the self-critical sequence training (SCST) strategy is applied to further optimize the image captioning model and improve the quality of generated sentences. Extensive experiments are organized on three public datasets of RSCID, UCM-Captions, and Sydney-Captions. Experimental results demonstrate the effectiveness of SCST strategy and the proposed method achieves superior performance compared with the state-of-the-art image captioning approaches on the RSCID dataset.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [2] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [4] Exploring region features in remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 127
  • [5] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [6] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Meng, Ran
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [7] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning
    Du, Runyan
    Cao, Wei
    Zhang, Wenkai
    Zhi, Guo
    Sun, Xian
    Li, Shuoke
    Li, Jihao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7704 - 7717
  • [9] Transformer with multi-level grid features and depth pooling for image captioning
    Bui, Doanh C.
    Nguyen, Tam V.
    Nguyen, Khang
    [J]. MACHINE VISION AND APPLICATIONS, 2024, 35 (05)
  • [10] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117