Deep Reinforcement Learning-based Image Captioning with Embedding Reward

被引:138
|
作者
Ren, Zhou [1 ]
Wang, Xiaoyu [1 ]
Zhang, Ning [1 ]
Lv, Xutao [1 ]
Li, Li-Jia [2 ]
机构
[1] Snap Inc, Venice, CA USA
[2] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR.2017.128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
引用
收藏
页码:1151 / 1159
页数:9
相关论文
共 50 条
  • [21] Reinforcement learning-based virtual network embedding: A comprehensive survey
    Lim, Hyun-Kyo
    Ullah, Ihsan
    Han, Youn-Hee
    Kim, Sang-Youn
    [J]. ICT EXPRESS, 2023, 9 (05): : 983 - 994
  • [22] Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
    Guo, Tszhang
    Chang, Shiyu
    Yu, Mo
    Bai, Kun
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 751 - 756
  • [23] A unified benchmark for deep reinforcement learning-based energy management: Novel training ideas with the unweighted reward
    Chen, Jiaxin
    Tang, Xiaolin
    Yang, Kai
    [J]. ENERGY, 2024, 307
  • [24] Learning-Based Image Restoration for Compressed Image through Neighboring Embedding
    Ma, Lin
    Wu, Feng
    Zhao, Debin
    Gao, Wen
    Ma, Siwei
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, 2008, 5353 : 279 - +
  • [25] Robust Adaptive Scaffolding with Inverse Reinforcement Learning-Based Reward Design
    Fahid, Fahmid Morshed
    Rowe, Jonathan P.
    Spain, Randall D.
    Goldberg, Benjamin S.
    Pokorny, Robert
    Lester, James
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION: POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS AND DOCTORAL CONSORTIUM, PT II, 2022, 13356 : 204 - 207
  • [26] Adaptive Reward Computation in Reinforcement Learning-Based Continuous Integration Testing
    Yang, Yang
    Pan, Chaoyue
    Li, Zheng
    Zhao, Ruilian
    [J]. IEEE ACCESS, 2021, 9 : 36674 - 36688
  • [27] A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues
    Himanshu Sharma
    Devanand Padha
    [J]. Artificial Intelligence Review, 2023, 56 : 13619 - 13661
  • [28] A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues
    Sharma, Himanshu
    Padha, Devanand
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (11) : 13619 - 13661
  • [29] Deep learning-based computed tomographic image super-resolution via wavelet embedding
    Kim, Hyeongsub
    Lee, Haenghwa
    Lee, Donghoon
    [J]. RADIATION PHYSICS AND CHEMISTRY, 2023, 205
  • [30] Reinforcement Learning Transformer for Image Captioning Generation Model
    Yan, Zhaojie
    [J]. FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701