A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning

被引:12
|
作者
Ren, Zihao [1 ]
Gou, Shuiping [1 ]
Guo, Zhang [2 ]
Mao, Shasha [1 ]
Li, Ruimin [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Peoples R China
[2] Xidian Univ, Acad Adv Interdisciplinary Res, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
image captioning; remote sensing; transformer network; global semantic information; topic token; GENERATION;
D O I
10.3390/rs14122939
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing image captioning aims to describe the content of images using natural language. In contrast with natural images, the scale, distribution, and number of objects generally vary in remote sensing images, making it hard to capture global semantic information and the relationships between objects at different scales. In this paper, in order to improve the accuracy and diversity of captioning, a mask-guided Transformer network with a topic token is proposed. Multi-head attention is introduced to extract features and capture the relationships between objects. On this basis, a topic token is added into the encoder, which represents the scene topic and serves as a prior in the decoder to help us focus better on global semantic information. Moreover, a new Mask-Cross-Entropy strategy is designed in order to improve the diversity of the generated captions, which randomly replaces some input words with a special word (named [Mask]) in the training stage, with the aim of enhancing the model's learning ability and forcing exploration of uncommon word relations. Experiments on three data sets show that the proposed method can generate captions with high accuracy and diversity, and the experimental results illustrate that the proposed method can outperform state-of-the-art models. Furthermore, the CIDEr score on the RSICD data set increased from 275.49 to 298.39.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Mask-guided network for image captioning
    Lim, Jian Han
    Chan, Chee Seng
    [J]. PATTERN RECOGNITION LETTERS, 2023, 173 : 79 - 86
  • [2] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [3] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [4] Retrieval Topic Recurrent Memory Network for Remote Sensing Image Captioning
    Wang, Binqiang
    Zheng, Xiangtao
    Qu, Bo
    Lu, Xiaoqiang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2020, 13 : 256 - 270
  • [5] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [6] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
    Cai, Chen
    Wang, Yi
    Yap, Kim-Hui
    [J]. REMOTE SENSING, 2023, 15 (23)
  • [7] Mask-Guided Local-2013;Global Attentive Network for Change Detection in Remote Sensing Images
    Xiong, Fengchao
    Li, Tianhan
    Chen, Jingzhou
    Zhou, Jun
    Qian, Yuntao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 3366 - 3378
  • [8] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [9] MGQFormer: Mask-Guided Query-Based Transformer for Image Manipulation Localization
    Zeng, Kunlun
    Cheng, Ri
    Tan, Weimin
    Yan, Bo
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6944 - 6952
  • [10] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
    Cai, Yuanhao
    Lin, Jing
    Hu, Xiaowan
    Wang, Haoqian
    Yuan, Xin
    Zhang, Yulun
    Timofte, Radu
    Van Gool, Luc
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17481 - 17490