Captioning Remote Sensing Images Using Transformer Architecture

被引:2
|
作者
Nanal, Wrucha [1 ]
Hajiarbabi, Mohammadreza [1 ]
机构
[1] Purdue Univ, Ft Wayne, IN 46805 USA
关键词
D O I
10.1109/ICAIIC57133.2023.10067039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image Captioning aspires to achieve a description of images with machines as a combination of Computer Vision (CV) and Natural Language Processing (NLP) fields. The current state of the art for image captioning use the Attention-based Encoder-Decoder model. The Attention-based model uses an 'Attention mechanism' that focuses on a particular section of the image to generate its corresponding caption word. The NLP side of this model uses Long Short-Term Memory (LSTM) for word generation. Attention-based models did not emphasize the relative arrangement of words in a caption thereby, ignoring the context of the sentence. Inspired by the versatility of Transformers in NLP, this work tries to utilise its architecture features for the Image Captioning use case. This work also makes use of a pretrained Bidirectional Encoder Representation of Transformer (BERT) which generates a contextually rich embedding of a caption. The Multi-Head Attention of the Transformer establishes a strong correlation between the image and contextually aware caption. This experiment is performed on the Remote Sensing Image Captioning Dataset. The results of the model are evaluated using NLP evaluation metrics such as Bilingual Evaluation Understudy 1-4 (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The proposed model shows better results for a few of the metrics.
引用
收藏
页码:413 / 418
页数:6
相关论文
共 50 条
  • [21] TRANSFORMER MODELS FOR MULTI-TEMPORAL LAND COVER CLASSIFICATION USING REMOTE SENSING IMAGES
    Voelsen, M.
    Lauble, S.
    Rottensteiner, F.
    Heipke, C.
    [J]. GEOSPATIAL WEEK 2023, VOL. 10-1, 2023, : 981 - 990
  • [22] REMOTE SENSING IMAGES CHANGE DETECTION USING THE SIAMESE NETWORK COMBINED WITH PURE SWIN TRANSFORMER
    Song, Xu
    Tong, Xinyu
    Hajamydeen, Asif Iqbal
    [J]. UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2024, 2024 (04): : 241 - 252
  • [23] Transformer-Based Regression Network for Pansharpening Remote Sensing Images
    Su, Xunyang
    Li, Jinjiang
    Hua, Zhen
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [24] Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
    Li, Gongyang
    Bai, Zhen
    Liu, Zhi
    Zhang, Xinpeng
    Ling, Haibin
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5257 - 5269
  • [25] Road Extraction by Multiscale Deformable Transformer From Remote Sensing Images
    Hu, Peng-Cheng
    Chen, Si-Bao
    Huang, Li-Li
    Wang, Gui-Zhou
    Tang, Jin
    Luo, Bin
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [26] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [27] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543
  • [28] A Systematic Survey of Remote Sensing Image Captioning
    Zhao, Beigeng
    [J]. IEEE ACCESS, 2021, 9 : 154086 - 154111
  • [29] PARALLEL PROCESSING OF MASSIVE REMOTE SENSING IMAGES IN A GPU ARCHITECTURE
    Liu, Peng
    Yuan, Tao
    Ma, Yan
    Wang, Lizhe
    Liu, Dingsheng
    Yue, Shasha
    Kolodziej, Joanna
    [J]. COMPUTING AND INFORMATICS, 2014, 33 (01) : 197 - 217
  • [30] An architecture for content-based retrieval of remote sensing images
    Cura, LMD
    Leite, NJ
    Medeiros, CB
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 303 - 306