Image Captioning using Reinforcement Learning with BLUDEr Optimization

被引:8
|
作者
Devi, P. R. [1 ]
Thrivikraman, V [1 ]
Kashyap, D. [1 ]
Shylaja, S. S. [1 ]
机构
[1] PES Univ, Dept Comp Sci & Engn, Bangalore 560085, Karnataka, India
关键词
image captioning; reinforcement learning; BLUDEr; self-critical sequence training; policy-gradient; deep learning; ResNet; long-short term memory; attention; BLEU; CIDEr;
D O I
10.1134/S1054661820040094
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image captioning is a growing field of research that has taken hold of the research community. It is a challenging task owing to the complexity of natural language generation and the difficulty involved in feature extraction from a diverse collection of images. Many models have been proposed to tackle the problem, like state-of-the-art encoder-decoder (Sequential CNN-RNN) systems that have proved to be capable of obtaining results. Recently, Reinforcement learning has made itself the new approach to the problem and has been successful in surpassing many of the state-of-the-art paradigms. We have come up with a new reward system known as the BLUDEr metric, which is a linear combination of the non-differentiable metrics BLEU and CIDEr. We directly optimize this metric for our model, on natural language generation tasks. In our experiments, we use the Flickr30k and Flickr8k datasets, which have become two of the benchmark datasets when it comes to image captioning systems. We have achieved state-of-the-art results on these two datasets, when compared with other models.
引用
收藏
页码:607 / 613
页数:7
相关论文
共 50 条
  • [1] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    P. R. Devi
    V. Thrivikraman
    D. Kashyap
    S. S. Shylaja
    [J]. Pattern Recognition and Image Analysis, 2020, 30 : 607 - 613
  • [2] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253
  • [3] Reinforcement Learning Transformer for Image Captioning Generation Model
    Yan, Zhaojie
    [J]. FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [4] Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
    Honda, Ukyo
    Watanabe, Taro
    Matsumoto, Yuji
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1124 - 1134
  • [5] Image Captioning: From Encoder-Decoder to Reinforcement Learning
    Tang, Yu
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
  • [6] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [7] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [8] Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    Liu, Mingming
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 203
  • [9] Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
    Liu, An-An
    Xu, Ning
    Zhang, Hanwang
    Nie, Weizhi
    Su, Yuting
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 821 - 827
  • [10] Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
    Guo, Tszhang
    Chang, Shiyu
    Yu, Mo
    Bai, Kun
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 751 - 756