Captioning Remote Sensing Images Using Transformer Architecture

被引：2

作者：

Nanal, Wrucha ^{[1
]}

Hajiarbabi, Mohammadreza ^{[1
]}

机构：

[1] Purdue Univ, Ft Wayne, IN 46805 USA

来源：

2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC | 2023年

关键词：

D O I：

10.1109/ICAIIC57133.2023.10067039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image Captioning aspires to achieve a description of images with machines as a combination of Computer Vision (CV) and Natural Language Processing (NLP) fields. The current state of the art for image captioning use the Attention-based Encoder-Decoder model. The Attention-based model uses an 'Attention mechanism' that focuses on a particular section of the image to generate its corresponding caption word. The NLP side of this model uses Long Short-Term Memory (LSTM) for word generation. Attention-based models did not emphasize the relative arrangement of words in a caption thereby, ignoring the context of the sentence. Inspired by the versatility of Transformers in NLP, this work tries to utilise its architecture features for the Image Captioning use case. This work also makes use of a pretrained Bidirectional Encoder Representation of Transformer (BERT) which generates a contextually rich embedding of a caption. The Multi-Head Attention of the Transformer establishes a strong correlation between the image and contextually aware caption. This experiment is performed on the Remote Sensing Image Captioning Dataset. The results of the model are evaluated using NLP evaluation metrics such as Bilingual Evaluation Understudy 1-4 (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The proposed model shows better results for a few of the metrics.

引用

页码：413 / 418

页数：6

共 50 条

[1] Cooperative Connection Transformer for Remote Sensing Image Captioning
Zhao, Kai
Xiong, Wei
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[2] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
Kandala, Hitesh
Saha, Sudipan
Banerjee, Biplab
Zhu, Xiao Xiang
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[3] Region-guided transformer for remote sensing image captioning
Zhao, Kai
Xiong, Wei
[J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
[4] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
Meng, Lingwu
Wang, Jing
Yang, Yang
Xiao, Liang
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
[5] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
Meng, Lingwu
Wang, Jing
Meng, Ran
Yang, Yang
Xiao, Liang
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[6] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
Liu, Chenyang
Zhao, Rui
Shi, Zhenwei
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[7] Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer
Zhuang, Shuo
Wang, Ping
Wang, Gang
Wang, Di
Chen, Jinyong
Gao, Feng
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[8] From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning
Du, Runyan
Cao, Wei
Zhang, Wenkai
Zhi, Guo
Sun, Xian
Li, Shuoke
Li, Jihao
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7704 - 7717
[9] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
Cao, Yukun
Yan, Jialuo
Tang, Yijia
He, Zhenyi
Xu, Kangle
Cheng, Yu
[J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
[10] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
Ren, Zihao
Gou, Shuiping
Guo, Zhang
Mao, Shasha
Li, Ruimin
[J]. REMOTE SENSING, 2022, 14 (12)

← 1 2 3 4 5 →