Captioning Remote Sensing Images Using Transformer Architecture

被引:2
|
作者
Nanal, Wrucha [1 ]
Hajiarbabi, Mohammadreza [1 ]
机构
[1] Purdue Univ, Ft Wayne, IN 46805 USA
关键词
D O I
10.1109/ICAIIC57133.2023.10067039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image Captioning aspires to achieve a description of images with machines as a combination of Computer Vision (CV) and Natural Language Processing (NLP) fields. The current state of the art for image captioning use the Attention-based Encoder-Decoder model. The Attention-based model uses an 'Attention mechanism' that focuses on a particular section of the image to generate its corresponding caption word. The NLP side of this model uses Long Short-Term Memory (LSTM) for word generation. Attention-based models did not emphasize the relative arrangement of words in a caption thereby, ignoring the context of the sentence. Inspired by the versatility of Transformers in NLP, this work tries to utilise its architecture features for the Image Captioning use case. This work also makes use of a pretrained Bidirectional Encoder Representation of Transformer (BERT) which generates a contextually rich embedding of a caption. The Multi-Head Attention of the Transformer establishes a strong correlation between the image and contextually aware caption. This experiment is performed on the Remote Sensing Image Captioning Dataset. The results of the model are evaluated using NLP evaluation metrics such as Bilingual Evaluation Understudy 1-4 (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The proposed model shows better results for a few of the metrics.
引用
收藏
页码:413 / 418
页数:6
相关论文
共 50 条
  • [1] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [2] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [4] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [5] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Meng, Ran
    Yang, Yang
    Xiao, Liang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [6] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [7] Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer
    Zhuang, Shuo
    Wang, Ping
    Wang, Gang
    Wang, Di
    Chen, Jinyong
    Gao, Feng
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning
    Du, Runyan
    Cao, Wei
    Zhang, Wenkai
    Zhi, Guo
    Sun, Xian
    Li, Shuoke
    Li, Jihao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7704 - 7717
  • [9] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [10] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    [J]. REMOTE SENSING, 2022, 14 (12)