Transformer-based image captioning by leveraging sentence information

被引:0
|
作者
Chahkandi, Vahid [1 ]
Fadaeieslam, Mohammad Javad [1 ]
Yaghmaee, Farzin [1 ]
机构
[1] Semnan Univ, Fac Elect & Comp Engn, Semnan, Iran
关键词
image captioning; nonautoregressive; attention; transformer; MODELS;
D O I
10.1117/1.JEI.31.4.043005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although the autoregressive image captioning methods yield good-quality image descriptions, their sequential structures slow down the speed of sentence generation processes. With a view to overcome these shortcomings, some nonautoregressive models have been proposed, but the quality of sentences produced by them is lower than those obtained in autoregressive methods. We have designed a new structure based on nonautoregressive methods to not only find better relations between sentence words and image salient objects but also combine this information with some positional information, extracted from the sentence, to generate a more qualified target sentence. The experimental results on the standard benchmark show that our proposed model achieves performance better than general nonautoregressive captioning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [2] ThaiTC:Thai Transformer-based Image Captioning
    Jaknamon, Teetouch
    Marukatat, Sanparith
    [J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [3] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [4] Transformer-based local-global guidance for image captioning
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 223
  • [5] Image captioning using transformer-based double attention network
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125
  • [6] Explaining transformer-based image captioning models: An empirical analysis
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. AI COMMUNICATIONS, 2022, 35 (02) : 111 - 129
  • [7] Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach
    Faisal Muhammad Shah
    Mayeesha Humaira
    Md Abidur Rahman Khan Jim
    Amit Saha Ami
    Shimul Paul
    [J]. SN Computer Science, 2022, 3 (1)
  • [8] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    [J]. ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [9] A performance analysis of transformer-based deep learning models for Arabic image captioning
    Alsayed, Ashwaq
    Qadah, Thamir M.
    Arif, Muhammad
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (09)
  • [10] Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework
    Su, Jindian
    Mou, Yueqi
    Xie, Yunhao
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 171 - 183