Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

被引:76
|
作者
Xu, Ning [1 ]
Zhang, Hanwang [2 ]
Liu, An-An [1 ]
Nie, Weizhi [1 ]
Su, Yuting [1 ]
Nie, Jie [3 ]
Zhang, Yongdong [4 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
[3] Ocean Univ China, Coll Informat Sci & Engn, Qingdao 266100, Peoples R China
[4] Univ Sci & Technol China, Hefei 230027, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Measurement; Task analysis; Reinforcement learning; Optimization; Adaptation models; Semantics; Multi-level policy; multi-level reward; reinforcement learning; image captioning;
D O I
10.1109/TMM.2019.2941820
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is one of the most challenging tasks in AI because it requires an understanding of both complex visuals and natural language. Because image captioning is essentially a sequential prediction task, recent advances in image captioning have used reinforcement learning (RL) to better explore the dynamics of word-by-word generation. However, the existing RL-based image captioning methods rely primarily on a single policy network and reward function-an approach that is not well matched to the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To solve this problem, we propose a novel multi-level policy and reward RL framework for image captioning that can be easily integrated with RNN-based captioning models, language metrics, or visual-semantic functions for optimization. Specifically, the proposed framework includes two modules: 1) a multi-level policy network that jointly updates the word- and sentence-level policies for word generation; and 2) a multi-level reward function that collaboratively leverages both a vision-language reward and a language-language reward to guide the policy. Furthermore, we propose a guidance term to bridge the policy and the reward for RL optimization. The extensive experiments on the MSCOCO and Flickr30k datasets and the analyses show that the proposed framework achieves competitive performances on a variety of evaluation metrics. In addition, we conduct ablation studies on multiple variants of the proposed framework and explore several representative image captioning models and metrics for the word-level policy network and the language-language reward function to evaluate the generalization ability of the proposed framework.
引用
收藏
页码:1372 / 1383
页数:12
相关论文
共 50 条
  • [1] Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
    Liu, An-An
    Xu, Ning
    Zhang, Hanwang
    Nie, Weizhi
    Su, Yuting
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 821 - 827
  • [2] Deep Reinforcement Learning-based Image Captioning with Embedding Reward
    Ren, Zhou
    Wang, Xiaoyu
    Zhang, Ning
    Lv, Xutao
    Li, Li-Jia
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1151 - 1159
  • [3] Reward-Based Exploration: Adaptive Control for Deep Reinforcement Learning
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09): : 2409 - 2412
  • [4] Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion
    Zhang, Bo-Kun
    Hu, Bin
    Chen, Long
    Zhang, Ding-Xue
    Cheng, Xin-Ming
    Guan, Zhi-Hong
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3352 - 3357
  • [5] Multi-level Visual Fusion Networks for Image Captioning
    Zhou, Dongming
    Zhang, Canlong
    Li, Zhixin
    Wang, Zhiwen
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Biologically inspired reinforcement learning: Reward-based decomposition for multi-goal environments
    Zhou, WD
    Coggins, R
    [J]. BIOLOGICALLY INSPIRED APPROACHES TO ADVANCED INFORMATION TECHNOLOGY, 2004, 3141 : 80 - 94
  • [7] Double awareness mechanism based deep learning framework for image captioning
    Gaurav
    Mathur, Pratistha
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2023, 26 (06): : 1801 - 1817
  • [8] Reward-based participant selection for improving federated reinforcement learning
    Lee, Woonghee
    [J]. ICT EXPRESS, 2023, 9 (05): : 803 - 808
  • [9] Safe reward-based deep reinforcement learning control for an electro-hydraulic servo system
    Wu, Minling
    Liu, Lijun
    Yu, Zhen
    Li, Weizhou
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (13) : 7646 - 7662
  • [10] Policy-based deep reinforcement learning for sparse reward environment
    Kim, MyeongSeop
    Kim, Jung-Su
    [J]. Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (03): : 506 - 514