CAPFORMER: PURE TRANSFORMER FOR REMOTE SENSING IMAGE CAPTION

被引:10
|
作者
Wang, Junjue [1 ]
Chen, Zihang [2 ]
Ma, Ailong [1 ]
Zhong, Yanfei [1 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
Remote sensing image caption; Transformer;
D O I
10.1109/IGARSS46834.2022.9883199
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
Accurately describing high-spatial resolution remote sensing images requires the understanding the inner attributes of the objects and the outer relations between different objects. The existing image caption algorithms lack the ability of global representation, which are not fit for the summarization of complex scenes. To this end, we propose a pure transformer (CapFormer) architecture for remote sensing image caption. Specifically, a scalable vision transformer is adopted for image representation, where the global content can be captured with multi-head self-attention layers. A transformer decoder is designed to successively translate the image features into comprehensive sentences. The transformer decoder explicitly model the historical words and interact with the image features using cross-attention layers. The comprehensive and ablation experiments on RSICD dataset demonstrate that the CapFormer outperforms the state-of-the-art image caption methods.
引用
收藏
页码:7996 / 7999
页数:4
相关论文
共 50 条
  • [1] TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption
    Chen, Zihang
    Wang, Junjue
    Ma, Ailong
    Zhong, Yanfei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [2] Remote sensing image caption generation via transformer and reinforcement learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
  • [3] Remote sensing image caption generation via transformer and reinforcement learning
    Xiangqing Shen
    Bing Liu
    Yong Zhou
    Jiaqi Zhao
    Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
  • [4] SwinSUNet: Pure Transformer Network for Remote Sensing Image Change Detection
    Zhang, Cui
    Wang, Liejun
    Cheng, Shuli
    Li, Yongming
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [5] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [6] Scene Attention Mechanism for Remote Sensing Image Caption Generation
    Wu, Shiqi
    Zhang, Xiangrong
    Wang, Xin
    Li, Chen
    Jiao, Licheng
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [7] Exploring Models and Data for Remote Sensing Image Caption Generation
    Lu, Xiaoqiang
    Wang, Binqiang
    Zheng, Xiangtao
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (04): : 2183 - 2195
  • [8] Remote Sensing Image Caption Method Based on Attention and Reinforcement Learning
    Nong Yuanjun
    Wang Junjie
    ACTA OPTICA SINICA, 2021, 41 (22)
  • [9] Remote Sensing Image Caption Method Based on Attention and Reinforcement Learning
    Nong Y.
    Wang J.
    Guangxue Xuebao/Acta Optica Sinica, 2021, 41 (22):
  • [10] DFEN: Dual Feature Enhancement Network for Remote Sensing Image Caption
    Zhao, Weihua
    Yang, Wenzhong
    Chen, Danny
    Wei, Fuyuan
    ELECTRONICS, 2023, 12 (07)