BENet: bi-directional enhanced network for image captioning

被引:0
|
作者
Peixin Yan
Zuoyong Li
Rong Hu
Xinrong Cao
机构
[1] Fujian University of Technology,Fujian Provincial Key Laboratory of Big Data Mining and Applications, School of Computer Science and Mathematics
[2] Minjiang University,Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Image captioning; Transformer; Bi-directional enhanced network; Memory bank; Reconstruct;
D O I
暂无
中图分类号
学科分类号
摘要
Transformer-based models have been used in image captioning to generate a natural language text for describing a given image accurately. In this paper, we propose a bi-directional enhanced network, which strengthens the correlation between image features and text features by the memory bank to improve the performance of the transformer-based encoder–decoder framework for image captioning. In addition, we fine-tune the connection method in the encoder to obtain rich image features. Specifically, during training, the memory bank is first used to store the correspondences between images and annotated texts in the dataset as additional information of image features. After processing through the encoder, we feed the visual features composed of image features and the additional information in the memory bank into the decoder to generate better caption. Subsequently, we utilize a decoder-like architecture to reconstruct visual features from the generated caption. Finally, we calculate the similarity loss between the reconstructed features and the visual features to optimize the encoder. Extensive experiments on the MSCOCO benchmark demonstrate that the proposed method has shown promising results on both the Karpathy test split and the online test server, providing evidence of its effectiveness.
引用
收藏
相关论文
共 50 条
  • [21] BDNE: A Method of Bi-Directional Distance Network Embedding
    Zhu, Dongjie
    Sun, Yundong
    Cao, Ning
    Qiao, Xueming
    Xu, Ming
    Li, Jinlin
    Yang, Junzhou
    2019 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2019, : 158 - 161
  • [22] Bi-Directional Link Multiplexing for MIMO Mesh Network
    Ono, Fumie
    Sakaguchi, Kei
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1524 - +
  • [23] Single-wavelength-pump bi-directional hybrid fiber amplifier for bi-directional local area network application
    Guo, Mars Ning
    Liaw, Shieri-Kuei
    Shum, Perry Ping
    Chen, Nan-Kuang
    Hung, Hsin-Kai
    Lin, Chinlon
    OPTICS COMMUNICATIONS, 2011, 284 (02) : 573 - 578
  • [24] Bi-attention network for bi-directional salient object detection
    Xu, Cheng
    Wang, Hui
    Liu, Xianhui
    Zhao, Weidong
    APPLIED INTELLIGENCE, 2023, 53 (19) : 21500 - 21516
  • [25] BI-DIRECTIONAL NORMALIZATION AND COLOR ATTENTION-GUIDED GENERATIVE ADVERSARIAL NETWORK FOR IMAGE ENHANCEMENT
    Liu, Shan
    Xiao, Guoqiang
    Xu, Xiaohui
    Wu, Song
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2205 - 2209
  • [26] A Quantum Bi-Directional Self-Organizing Neural Network (QBDSONN) for Binary Image Denoising
    Konar, Debanjan
    Bhattacharyya, Siddhartha
    Das, Nibaran
    Panigrahi, Bijaya Ketan
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1225 - 1230
  • [27] EBiDA-FPN: enhanced bi-directional attention feature pyramid network for object detection
    Yang, Xiaobao
    He, Yulong
    Wu, Junsheng
    Wang, Wentao
    Sun, Wei
    Ma, Sugang
    Hou, Zhiqiang
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [28] A Bi-directional Interface linking a Dialysis Network with a Clinical Information Network
    Allen, Glen
    Korossy, Steve
    Frost, Rubin
    Barbara, Jeffrey A. J.
    ELECTRONIC JOURNAL OF HEALTH INFORMATICS, 2009, 4 (01):
  • [29] Unifying Multimodal Transformer for Bi-directional Image and Text Generation
    Huang, Yupan
    Xue, Hongwei
    Liu, Bei
    Lu, Yutong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1138 - 1147
  • [30] IMAGE ENLARGEMENT USING BI-DIRECTIONAL SHIFTED LINEAR INTERPOLATION
    Tamura, Yuta
    Tanaka, Kiyoshi
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS SYSTEMS (ISPACS 2008), 2008, : 290 - 293