Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework

被引:0
|
作者
Su, Jindian [1 ]
Mou, Yueqi [1 ]
Xie, Yunhao [2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
关键词
Image Captioning; Two-Pass Decoding; Transformer;
D O I
10.1007/978-981-97-5663-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The two-pass decoding framework significantly enhances image captioning models. However, existing two-pass models often train from scratch, missing the opportunity to fully leverage pre-trained knowledge from single-pass models. This practice leads to increased training cost and complexity. In this paper, we propose a unified two-pass decoding framework comprising three core modules: a pre-trained Visual Encoder, a pre-trained Draft Decoder, and a Deliberation Decoder. To enable effective information alignment and complementation between image and draft caption, we design a Cross-Modality Fusion (CMF) module in the Deliberation Decoder, forming a Cross-Modality Fusion-based Deliberation Decoder (CMF-DD). During the training process, we facilitate the transfer of foundational knowledge by extensively sharing parameters between the Draft and Deliberation Decoders. At the same time, we fix parameters from the single-pass baseline and only update a small subset within the Deliberation Decoder to reduce cost and complexity. Additionally, we introduce a Dominance-Adaptive reward scoring algorithm within the reinforcement learning stage to pertinently enhance the quality of refinements. Experiments on MS COCO datasets demonstrate that our method achieves substantial improvements over single-pass decoding baselines and competes favorably with other two-pass decoding methods.
引用
收藏
页码:171 / 183
页数:13
相关论文
共 50 条
  • [1] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    [J]. IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [2] ThaiTC:Thai Transformer-based Image Captioning
    Jaknamon, Teetouch
    Marukatat, Sanparith
    [J]. 2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [3] A Review of Transformer-Based Approaches for Image Captioning
    Ondeng, Oscar
    Ouma, Heywood
    Akuon, Peter
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [4] Image Alone Are Not Enough: A General Semantic-Augmented Transformer-Based Framework for Image Captioning
    Liu, Jiawei
    Lin, Xin
    He, Liang
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Transformer-based image captioning by leveraging sentence information
    Chahkandi, Vahid
    Fadaeieslam, Mohammad Javad
    Yaghmaee, Farzin
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [6] Transformer-based local-global guidance for image captioning
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 223
  • [7] Image captioning using transformer-based double attention network
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 125
  • [8] Explaining transformer-based image captioning models: An empirical analysis
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. AI COMMUNICATIONS, 2022, 35 (02) : 111 - 129
  • [9] TRANSFORMER BASED DELIBERATION FOR TWO-PASS SPEECH RECOGNITION
    Hu, Ke
    Pang, Ruoming
    Sainath, Tara N.
    Strohman, Trevor
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 68 - 74
  • [10] Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach
    Faisal Muhammad Shah
    Mayeesha Humaira
    Md Abidur Rahman Khan Jim
    Amit Saha Ami
    Shimul Paul
    [J]. SN Computer Science, 2022, 3 (1)