Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework

被引:0
|
作者
Su, Jindian [1 ]
Mou, Yueqi [1 ]
Xie, Yunhao [2 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
关键词
Image Captioning; Two-Pass Decoding; Transformer;
D O I
10.1007/978-981-97-5663-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The two-pass decoding framework significantly enhances image captioning models. However, existing two-pass models often train from scratch, missing the opportunity to fully leverage pre-trained knowledge from single-pass models. This practice leads to increased training cost and complexity. In this paper, we propose a unified two-pass decoding framework comprising three core modules: a pre-trained Visual Encoder, a pre-trained Draft Decoder, and a Deliberation Decoder. To enable effective information alignment and complementation between image and draft caption, we design a Cross-Modality Fusion (CMF) module in the Deliberation Decoder, forming a Cross-Modality Fusion-based Deliberation Decoder (CMF-DD). During the training process, we facilitate the transfer of foundational knowledge by extensively sharing parameters between the Draft and Deliberation Decoders. At the same time, we fix parameters from the single-pass baseline and only update a small subset within the Deliberation Decoder to reduce cost and complexity. Additionally, we introduce a Dominance-Adaptive reward scoring algorithm within the reinforcement learning stage to pertinently enhance the quality of refinements. Experiments on MS COCO datasets demonstrate that our method achieves substantial improvements over single-pass decoding baselines and competes favorably with other two-pass decoding methods.
引用
收藏
页码:171 / 183
页数:13
相关论文
共 50 条
  • [31] A No-reference Image Blur Metric Based on Two-pass Edge Analysis
    Ma, Xiaoyu
    Jiang, Xiuhua
    Lei, Xiaohua
    Zhang, Hui
    Liu, Ping
    [J]. 2015 11TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2015, : 919 - 924
  • [32] TRANSFORMER-BASED SAR IMAGE DESPECKLING
    Perera, Malsha V.
    Bandara, Wele Gedara Chaminda
    Valanarasu, Jeya Maria Jose
    Patel, Vishal M.
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 751 - 754
  • [33] From Patch to Pixel: A Transformer-Based Hierarchical Framework for Compressive Image Sensing
    Gan, Hongping
    Shen, Minghe
    Hua, Yi
    Ma, Chunyan
    Zhang, Tao
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2023, 9 : 133 - 146
  • [34] NFC Based Two-Pass Mobile Authentication
    Vempati, Jagannadh
    Bajwa, Garima
    Dantu, Ram
    [J]. RESEARCH IN ATTACKS, INTRUSIONS, AND DEFENSES, 2013, 8145 : 467 - 468
  • [35] Multiple-Symbol Interleaved RS Codes and Two-Pass Decoding Algorithm
    WANG Zhongfeng
    Ahmad Chini
    Mehdi T.Kilani
    ZHOU Jun
    [J]. China Communications, 2016, 13 (04) : 14 - 19
  • [36] Multiple-Symbol Interleaved RS Codes and Two-Pass Decoding Algorithm
    Wang Zhongfeng
    Chini, Ahmad
    Kilani, Mehdi T.
    Zhou Jun
    [J]. CHINA COMMUNICATIONS, 2016, 13 (04) : 14 - 19
  • [37] Adaptive two-pass median filter based on support vector machines for image restoration
    Lin, TC
    Yu, PT
    [J]. NEURAL COMPUTATION, 2004, 16 (02) : 333 - 354
  • [38] Transformer-Based Unified Neural Network for Quality Estimation and Transformer-Based Re-decoding Model for Machine Translation
    Chen, Cong
    Zong, Qinqin
    Luo, Qi
    Qiu, Bailian
    Li, Maoxi
    [J]. MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 66 - 75
  • [39] A Transformer-Based Framework for Tiny Object Detection
    Liao, Yi-Kai
    Lin, Gong-Si
    Yeh, Mei-Chen
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 373 - 377
  • [40] A Transformer-Based Framework for Geomagnetic Activity Prediction
    Abduallah, Yasser
    Wang, Jason T. L.
    Xu, Chunhui
    Wang, Haimin
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022), 2022, 13515 : 325 - 335