BENet: bi-directional enhanced network for image captioning

被引:0
|
作者
Peixin Yan
Zuoyong Li
Rong Hu
Xinrong Cao
机构
[1] Fujian University of Technology,Fujian Provincial Key Laboratory of Big Data Mining and Applications, School of Computer Science and Mathematics
[2] Minjiang University,Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Image captioning; Transformer; Bi-directional enhanced network; Memory bank; Reconstruct;
D O I
暂无
中图分类号
学科分类号
摘要
Transformer-based models have been used in image captioning to generate a natural language text for describing a given image accurately. In this paper, we propose a bi-directional enhanced network, which strengthens the correlation between image features and text features by the memory bank to improve the performance of the transformer-based encoder–decoder framework for image captioning. In addition, we fine-tune the connection method in the encoder to obtain rich image features. Specifically, during training, the memory bank is first used to store the correspondences between images and annotated texts in the dataset as additional information of image features. After processing through the encoder, we feed the visual features composed of image features and the additional information in the memory bank into the decoder to generate better caption. Subsequently, we utilize a decoder-like architecture to reconstruct visual features from the generated caption. Finally, we calculate the similarity loss between the reconstructed features and the visual features to optimize the encoder. Extensive experiments on the MSCOCO benchmark demonstrate that the proposed method has shown promising results on both the Karpathy test split and the online test server, providing evidence of its effectiveness.
引用
收藏
相关论文
共 50 条
  • [31] Facial Image Completion Using Bi-Directional Pixel LSTM
    Yu, Xiulan
    He, Jiahao
    Zhang, Zufan
    IEEE ACCESS, 2020, 8 : 48642 - 48651
  • [32] The Bi-directional Framework for Unifying Parametric Image Alignment Approaches
    Megret, Remi
    Authesserre, Jean-Baptiste
    Berthoumieu, Yannick
    COMPUTER VISION - ECCV 2008, PT III, PROCEEDINGS, 2008, 5304 : 400 - 411
  • [33] Deep Stereo Image Compression via Bi-directional Coding
    Lei, Jianjun
    Liu, Xiangrui
    Peng, Bo
    Jin, Dengchao
    Li, Wanqing
    Gu, Jingxiao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19637 - 19646
  • [34] Enhanced Bi-directional Motion Estimation for Video Frame Interpolation
    Jin, Xin
    Wu, Longhai
    Shen, Guotao
    Chen, Youxin
    Chen, Jie
    Koo, Jayoon
    Hahm, Cheul-Hee
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5038 - 5046
  • [35] Bi-directional adaptive enhanced A* algorithm for mobile robot navigation
    Gharbi, Atef
    APPLIED COMPUTING AND INFORMATICS, 2024,
  • [36] Bi-directional self-healing ring network planning
    Wen, UP
    Wu, TL
    Shyur, CC
    COMPUTERS & OPERATIONS RESEARCH, 2002, 29 (12) : 1719 - 1737
  • [37] Performance Analysis of Relay Selection for Bi-Directional Cooperative Network
    Wei, Sha
    Li, Jun
    Su, Hang
    2011 7TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING (WICOM), 2011,
  • [38] Local bi-directional funnel network for salient object detection
    Pan, Zefeng
    Li, Junxia
    Wang, Ziyang
    ELECTRONICS LETTERS, 2021, 57 (04) : 187 - 189
  • [39] BDCN: Bi-Directional Cascade Network for Perceptual Edge Detection
    He, Jianzhong
    Zhang, Shiliang
    Yang, Ming
    Shan, Yanhu
    Huang, Tiejun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 100 - 113
  • [40] Bi-directional relay network employing signal space diversity
    Yilmaz, Mumtaz
    JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2020, 71 (03): : 203 - 209