Deep Reinforcement Learning-based Image Captioning with Embedding Reward

被引:138
|
作者
Ren, Zhou [1 ]
Wang, Xiaoyu [1 ]
Zhang, Ning [1 ]
Lv, Xutao [1 ]
Li, Li-Jia [2 ]
机构
[1] Snap Inc, Venice, CA USA
[2] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR.2017.128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
引用
收藏
页码:1151 / 1159
页数:9
相关论文
共 50 条
  • [31] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    Devi, P. R.
    Thrivikraman, V
    Kashyap, D.
    Shylaja, S. S.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2020, 30 (04) : 607 - 613
  • [32] A Deep Reinforcement Learning-Based Framework for Content Caching
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2018 52ND ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2018,
  • [33] Deep reinforcement learning-based robust missile guidance
    Ahn, Jeongsu
    Shin, Jongho
    Kim, Hyeong-Geun
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 927 - 930
  • [34] A Deep Reinforcement Learning-Based Approach in Porker Game
    Kong, Yan
    Rui, Yefeng
    Hsia, Chih-Hsien
    [J]. Journal of Computers (Taiwan), 2023, 34 (02) : 41 - 51
  • [35] Image Captioning using Adversarial Networks and Reinforcement Learning
    Yan, Shiyang
    Wu, Fangyu
    Smith, Jeremy S.
    Lu, Wenjin
    Zhang, Bailing
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 248 - 253
  • [36] Deep Reinforcement Learning-based Traffic Signal Control
    Ruan, Junyun
    Tang, Jinzhuo
    Gao, Ge
    Shi, Tianyu
    Khamis, Alaa
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM, 2023, : 21 - 26
  • [37] Deep reinforcement learning-based antilock braking algorithm
    Mantripragada, V. Krishna Teja
    Kumar, R. Krishna
    [J]. VEHICLE SYSTEM DYNAMICS, 2023, 61 (05) : 1410 - 1431
  • [38] Deep Reinforcement Learning-Based Defense Strategy Selection
    Charpentier, Axel
    Boulahia-Cuppens, Nora
    Cuppens, Frederic
    Yaich, Reda
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,
  • [39] Image Captioning using Reinforcement Learning with BLUDEr Optimization
    P. R. Devi
    V. Thrivikraman
    D. Kashyap
    S. S. Shylaja
    [J]. Pattern Recognition and Image Analysis, 2020, 30 : 607 - 613
  • [40] Computing on Wheels: A Deep Reinforcement Learning-Based Approach
    Kazmi, S. M. Ahsan
    Tai Manh Ho
    Tuong Tri Nguyen
    Fahim, Muhammad
    Khan, Adil
    Piran, Md Jalil
    Baye, Gaspard
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) : 22535 - 22548