A performance analysis of transformer-based deep learning models for Arabic image captioning

被引:0
|
作者
Alsayed, Ashwaq [1 ]
Qadah, Thamir M. [1 ]
Arif, Muhammad [1 ]
机构
[1] Umm Al Qura Univ, Coll Comp & Informat Syst, Comp Sci Dept, Mecca, Saudi Arabia
关键词
Image captioning; Arabic image captioning; Transformer model; Performance analysis and evaluation; Deep learning; Machine learning; Arabic technologies;
D O I
10.1016/j.jksuci.2023.101750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34-92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9-11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196-379% in the BLUE-4 score. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Explaining transformer-based image captioning models: An empirical analysis
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. AI COMMUNICATIONS, 2022, 35 (02) : 111 - 129
  • [2] Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach
    Faisal Muhammad Shah
    Mayeesha Humaira
    Md Abidur Rahman Khan Jim
    Amit Saha Ami
    Shimul Paul
    [J]. SN Computer Science, 2022, 3 (1)
  • [3] An Ensemble of Arabic Transformer-based Models for Arabic Sentiment Analysis
    El Karfi, Ikram
    El Fkihi, Sanaa
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 561 - 567
  • [4] Transformer-based Extraction of Deep Image Models
    Battis, Verena
    Penner, Alexander
    [J]. 2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, : 320 - 336
  • [5] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [6] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE Access, 2020, 8 : 213437 - 213446
  • [7] ThaiTC:Thai Transformer-based Image Captioning
    Jaknamon, Teetouch
    Marukatat, Sanparith
    [J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [8] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [9] Transformer-based image captioning by leveraging sentence information
    Chahkandi, Vahid
    Fadaeieslam, Mohammad Javad
    Yaghmaee, Farzin
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [10] Automatic Bangla Image Captioning Based on Transformer Model in Deep Learning
    Hossain, Md Anwar
    Hasan, Mirza A. F. M. Rashidul
    Hossen, Ebrahim
    Asraful, Md
    Faruk, Md Omar
    Abadin, A. F. M. Zainul
    Ali, Md Suhag
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1110 - 1117