Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer

被引:15
|
作者
Zhuang, Shuo [1 ,2 ]
Wang, Ping [3 ]
Wang, Gang [2 ,3 ]
Wang, Di [3 ]
Chen, Jinyong [2 ]
Gao, Feng [2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Peoples R China
[2] CETC Key Lab Aerosp Informat Applicat, Shijiazhuang 050081, Hebei, Peoples R China
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Transformers; Decoding; Visualization; Training; Measurement; Semantics; Convolutional neural network (CNN); image captioning; remote sensing; transformer; MODELS;
D O I
10.1109/LGRS.2021.3135711
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing image captioning (RSIC) has great significance in image understanding, which describes the image content in natural language. Existing methods are mainly based on deep learning and rely on the encoder-decoder model to generate sentences. In the decoding process, recurrent neural network (RNN) and long short-term memory (LSTM) are normally applied to sequentially generate image captions. In this letter, the transformer encoder-decoder is combined with grid features to improve the RSIC performance. First, the pretrained convolutional neural network (CNN) is used to extract grid-based visual features, which are encoded as vectorial representations. Then, the transformer outputs semantic descriptions to bridge visual features and natural language. Besides, the self-critical sequence training (SCST) strategy is applied to further optimize the image captioning model and improve the quality of generated sentences. Extensive experiments are organized on three public datasets of RSCID, UCM-Captions, and Sydney-Captions. Experimental results demonstrate the effectiveness of SCST strategy and the proposed method achieves superior performance compared with the state-of-the-art image captioning approaches on the RSCID dataset.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Multiscale Methods for Optical Remote-Sensing Image Captioning
    Ma, Xiaofeng
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (11) : 2001 - 2005
  • [42] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Integrating grid features and geometric coordinates for enhanced image captioning
    Fengzhi Zhao
    Zhezhou Yu
    He Zhao
    Tao Wang
    Tian Bai
    [J]. Applied Intelligence, 2024, 54 : 231 - 245
  • [44] Distance Transformer for Image Captioning
    Wang, Jiarong
    Lu, Tongwei
    Liu, Xuanxuan
    Yang, Qi
    [J]. 2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 73 - 76
  • [45] Rotary Transformer for Image Captioning
    Qiu, Yile
    Zhu, Li
    [J]. SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328
  • [46] Entangled Transformer for Image Captioning
    Li, Guang
    Zhu, Linchao
    Liu, Ping
    Yang, Yi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8927 - 8936
  • [47] Boosted Transformer for Image Captioning
    Li, Jiangyun
    Yao, Peng
    Guo, Longteng
    Zhang, Weicun
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [48] Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective
    Hoxha, Genc
    Melgani, Farid
    Demir, Begum
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 : 4462 - 4475
  • [49] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [50] Efficient Transformer for Remote Sensing Image Segmentation
    Xu, Zhiyong
    Zhang, Weicun
    Zhang, Tianxiang
    Yang, Zhifang
    Li, Jiangyun
    [J]. REMOTE SENSING, 2021, 13 (18)