A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning

被引:12
|
作者
Ren, Zihao [1 ]
Gou, Shuiping [1 ]
Guo, Zhang [2 ]
Mao, Shasha [1 ]
Li, Ruimin [2 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Peoples R China
[2] Xidian Univ, Acad Adv Interdisciplinary Res, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
image captioning; remote sensing; transformer network; global semantic information; topic token; GENERATION;
D O I
10.3390/rs14122939
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Remote sensing image captioning aims to describe the content of images using natural language. In contrast with natural images, the scale, distribution, and number of objects generally vary in remote sensing images, making it hard to capture global semantic information and the relationships between objects at different scales. In this paper, in order to improve the accuracy and diversity of captioning, a mask-guided Transformer network with a topic token is proposed. Multi-head attention is introduced to extract features and capture the relationships between objects. On this basis, a topic token is added into the encoder, which represents the scene topic and serves as a prior in the decoder to help us focus better on global semantic information. Moreover, a new Mask-Cross-Entropy strategy is designed in order to improve the diversity of the generated captions, which randomly replaces some input words with a special word (named [Mask]) in the training stage, with the aim of enhancing the model's learning ability and forcing exploration of uncommon word relations. Experiments on three data sets show that the proposed method can generate captions with high accuracy and diversity, and the experimental results illustrate that the proposed method can outperform state-of-the-art models. Furthermore, the CIDEr score on the RSICD data set increased from 275.49 to 298.39.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [42] Topic Guided Image Captioning with Scene and Spatial Features
    Zia, Usman
    Riaz, M. Mohsin
    Ghafoor, Abdul
    [J]. ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 2, 2022, 450 : 180 - 191
  • [43] Mask-guided noise restriction adversarial attacks for image classification
    Duan, Yexin
    Zhou, Xingyu
    Zou, Junhua
    Qiu, Junyang
    Zhang, Jin
    Pan, Zhisong
    [J]. COMPUTERS & SECURITY, 2021, 100
  • [44] Captioning Remote Sensing Images Using Transformer Architecture
    Nanal, Wrucha
    Hajiarbabi, Mohammadreza
    [J]. 2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
  • [45] TOPIC-GUIDED LOCAL-GLOBAL GRAPH NEURAL NETWORK FOR IMAGE CAPTIONING
    Kan, Jichao
    Hu, Kun
    Wang, Zhiyong
    Wu, Qiuxia
    Hagenbuchner, Markus
    Tsoi, Ah Chung
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [46] MASK-GUIDED STYLE TRANSFER NETWORK FOR PURIFYING REAL IMAGES
    Zhao, Tongtong
    Yan, Yuxiao
    Peng, Jinjia
    Wang, Huibing
    Fu, Xianping
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 429 - 434
  • [47] Mask-Guided Spatial–Spectral MLP Network for High-Resolution Hyperspectral Image Reconstruction
    Han, Xian-Hua
    Wang, Jian
    Chen, Yen-Wei
    [J]. Sensors, 2024, 24 (22)
  • [48] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [49] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543
  • [50] A Systematic Survey of Remote Sensing Image Captioning
    Zhao, Beigeng
    [J]. IEEE ACCESS, 2021, 9 : 154086 - 154111