Collaborative Learning Method for Natural Image Captioning

被引:0
|
作者
Wang, Rongzhao [1 ]
Liu, Libo [1 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China
来源
关键词
Image captioning; Pix2pix inverting; Collaborative learning;
D O I
10.1007/978-981-19-5194-7_19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a collaborative learning method to solve the natural image captioning problem. Numerous existing methods use pretrained image classification CNNs to obtain feature representations for image caption generation, which ignores the gap in image feature representations between different computer vision tasks. To address this problem, our method aims to utilize the similarity between image caption and pix-to-pix inverting tasks to ease the feature representation gap. Specifically, our framework consists of two modules: 1) The pix2pix module (P2PM), which has a share learning feature extractor to extract feature representations and a U-net architecture to encode the image to latent code and then decodes them to the original image. 2) The natural language generation module (NLGM) generates descriptions from feature representations extracted by P2PM. Consequently, the feature representations and generated image captions are improved during the collaborative learning process. The experimental results on the MSCOCO 2017 dataset prove the effectiveness of our approach compared to other comparison methods.
引用
收藏
页码:249 / 261
页数:13
相关论文
共 50 条
  • [41] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253
  • [42] Learning Cooperative Neural Modules for Stylized Image Captioning
    Xinxiao Wu
    Wentian Zhao
    Jiebo Luo
    International Journal of Computer Vision, 2022, 130 : 2305 - 2320
  • [43] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    P. R. Devi
    V. Thrivikraman
    D. Kashyap
    S. S. Shylaja
    Pattern Recognition and Image Analysis, 2020, 30 : 607 - 613
  • [44] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
  • [45] A Multi-task Learning Approach for Image Captioning
    Zhao, Wei
    Wang, Benyou
    Ye, Jianbo
    Yang, Min
    Zhao, Zhou
    Luo, Ruotian
    Qiao, Yu
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1205 - 1211
  • [46] Learning joint relationship attention network for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [47] Image Captioning Using Multimodal Deep Learning Approach
    Farkh, Rihem
    Oudinet, Ghislain
    Foued, Yasser
    Computers, Materials and Continua, 2024, 81 (03): : 3951 - 3968
  • [48] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [49] A Two-Step Retrieval Method for Image Captioning
    Pellegrin, Luis
    Vanegas, Jorge A.
    Arevalo, John
    Beltran, Viviana
    Jair Escalante, Hugo
    Montes-y-Gomez, Manuel
    Gonzalez, Fabio A.
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, CLEF 2016, 2016, 9822 : 150 - 161
  • [50] An Efficient Image Captioning Method Based on Beam Search
    Jaiswal, Tarun
    Pandey, Manju
    Tripathi, Priyanka
    RECENT ADVANCES IN ELECTRICAL & ELECTRONIC ENGINEERING, 2025, 18 (02) : 147 - 160