Transformer-based image captioning by leveraging sentence information

被引:0
|
作者
Chahkandi, Vahid [1 ]
Fadaeieslam, Mohammad Javad [1 ]
Yaghmaee, Farzin [1 ]
机构
[1] Semnan Univ, Fac Elect & Comp Engn, Semnan, Iran
关键词
image captioning; nonautoregressive; attention; transformer; MODELS;
D O I
10.1117/1.JEI.31.4.043005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although the autoregressive image captioning methods yield good-quality image descriptions, their sequential structures slow down the speed of sentence generation processes. With a view to overcome these shortcomings, some nonautoregressive models have been proposed, but the quality of sentences produced by them is lower than those obtained in autoregressive methods. We have designed a new structure based on nonautoregressive methods to not only find better relations between sentence words and image salient objects but also combine this information with some positional information, extracted from the sentence, to generate a more qualified target sentence. The experimental results on the standard benchmark show that our proposed model achieves performance better than general nonautoregressive captioning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Reinforced Transformer for Medical Image Captioning
    Xiong, Yuxuan
    Du, Bo
    Yan, Pingkun
    [J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 673 - 680
  • [42] Transformer with a Parallel Decoder for Image Captioning
    Wei, Peilang
    Liu, Xu
    Luo, Jun
    Pu, Huayan
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Yang, Shouhong
    Zhuang, Xu
    Wang, Jason
    Yue, Hong
    Ji, Cheng
    Zhou, Mingliang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [43] Image captioning with transformer and knowledge graph
    Zhang, Yu
    Shi, Xinyu
    Mi, Siya
    Yang, Xu
    [J]. PATTERN RECOGNITION LETTERS, 2021, 143 (143) : 43 - 49
  • [44] Complementary Shifted Transformer for Image Captioning
    Yanbo Liu
    You Yang
    Ruoyu Xiang
    Jixin Ma
    [J]. Neural Processing Letters, 2023, 55 : 8339 - 8363
  • [45] ReFormer: The Relational Transformer for Image Captioning
    Yang, Xuewen
    Liu, Yingru
    Wang, Xin
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5398 - 5406
  • [46] ETransCap: efficient transformer for image captioning
    Mundu, Albert
    Singh, Satish Kumar
    Dubey, Shiv Ram
    [J]. APPLIED INTELLIGENCE, 2024, 54 (21) : 10748 - 10762
  • [47] Direction Relation Transformer for Image Captioning
    Song, Zeliang
    Zhou, Xiaofei
    Dong, Linhua
    Tan, Jianlong
    Guo, Li
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5056 - 5064
  • [48] DesnowFormer: an effective transformer-based image desnowing network
    Zhang, Ting
    Jiang, Nanfeng
    Lin, Junhong
    Lin, Jielian
    Zhao, Tiesong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [49] Recent progress in transformer-based medical image analysis
    Liu, Zhaoshan
    Lv, Qiujie
    Yang, Ziduo
    Li, Yifan
    Lee, Chau Hung
    Shen, Lei
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 164
  • [50] TransInpaint: Transformer-based Image Inpainting with Context Adaptation
    Shamsolmoali, Pourya
    Zareapoor, Masoumeh
    Granger, Eric
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 849 - 858