BENet: bi-directional enhanced network for image captioning

被引:0
|
作者
Peixin Yan
Zuoyong Li
Rong Hu
Xinrong Cao
机构
[1] Fujian University of Technology,Fujian Provincial Key Laboratory of Big Data Mining and Applications, School of Computer Science and Mathematics
[2] Minjiang University,Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, College of Computer and Control Engineering
来源
Multimedia Systems | 2024年 / 30卷
关键词
Image captioning; Transformer; Bi-directional enhanced network; Memory bank; Reconstruct;
D O I
暂无
中图分类号
学科分类号
摘要
Transformer-based models have been used in image captioning to generate a natural language text for describing a given image accurately. In this paper, we propose a bi-directional enhanced network, which strengthens the correlation between image features and text features by the memory bank to improve the performance of the transformer-based encoder–decoder framework for image captioning. In addition, we fine-tune the connection method in the encoder to obtain rich image features. Specifically, during training, the memory bank is first used to store the correspondences between images and annotated texts in the dataset as additional information of image features. After processing through the encoder, we feed the visual features composed of image features and the additional information in the memory bank into the decoder to generate better caption. Subsequently, we utilize a decoder-like architecture to reconstruct visual features from the generated caption. Finally, we calculate the similarity loss between the reconstructed features and the visual features to optimize the encoder. Extensive experiments on the MSCOCO benchmark demonstrate that the proposed method has shown promising results on both the Karpathy test split and the online test server, providing evidence of its effectiveness.
引用
收藏
相关论文
共 50 条
  • [1] BENet: bi-directional enhanced network for image captioning
    Yan, Peixin
    Li, Zuoyong
    Hu, Rong
    Cao, Xinrong
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [2] Bi-Directional Co-Attention Network for Image Captioning
    Jiang, Weitao
    Wang, Weixuan
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
  • [3] Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 167 - 173
  • [4] Bi-Directional Seed Attention Network for Interactive Image Segmentation
    Song, Gwangmo
    Lee, Kyoung Mu
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1540 - 1544
  • [5] Bi-directional Relationship Inferring Network for Referring Image Segmentation
    Hu, Zhiwei
    Feng, Guang
    Sun, Jiayu
    Zhang, Lihe
    Lu, Huchuan
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4423 - 4432
  • [6] TCP With Network Coding Enhanced in Bi-Directional Loss Tolerance
    Nguyen Viet Ha
    Nguyen, Tran T. T.
    Tsuru, Masato
    IEEE COMMUNICATIONS LETTERS, 2020, 24 (03) : 520 - 524
  • [7] Bi-directional lstm network speech-to-gesture generation using bi-directional lstm network
    Kaneko N.
    Takeuchi K.
    Hasegawa D.
    Shirakawa S.
    Sakuta H.
    Sumi K.
    Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (06):
  • [8] Image Fusion Using Bi-directional Similarity
    Bai Chunshan
    Luo Xiaoyan
    HOLOGRAPHY: ADVANCES AND MODERN TRENDS IV, 2015, 9508
  • [9] Bi-directional Interaction Network for Person Search
    Dong, Wenkai
    Zhang, Zhaoxiang
    Song, Chunfeng
    Tan, Tieniu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2836 - 2845
  • [10] Bi-Directional Pyramid Network for Edge Detection
    Li, Kai
    Tian, Yingjie
    Wang, Bo
    Qi, Zhiquan
    Wang, Qi
    ELECTRONICS, 2021, 10 (03) : 1 - 15