Deep Reinforcement Learning-based Image Captioning with Embedding Reward

被引:138
|
作者
Ren, Zhou [1 ]
Wang, Xiaoyu [1 ]
Zhang, Ning [1 ]
Lv, Xutao [1 ]
Li, Li-Jia [2 ]
机构
[1] Snap Inc, Venice, CA USA
[2] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR.2017.128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.
引用
收藏
页码:1151 / 1159
页数:9
相关论文
共 50 条
  • [1] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    [J]. ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [2] Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning
    Xu, Ning
    Zhang, Hanwang
    Liu, An-An
    Nie, Weizhi
    Su, Yuting
    Nie, Jie
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1372 - 1383
  • [3] Oppositional Harris Hawks Optimization with Deep Learning-Based Image Captioning
    Kavitha, V. R.
    Nimala, K.
    Beno, A.
    Ramya, K. C.
    Kadry, Seifedine
    Kang, Byeong-Gwon
    Nam, Yunyoung
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 579 - 593
  • [4] From Show to Tell: A Survey on Deep Learning-Based Image Captioning
    Stefanini, Matteo
    Cornia, Marcella
    Baraldi, Lorenzo
    Cascianelli, Silvia
    Fiameni, Giuseppe
    Cucchiara, Rita
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 539 - 559
  • [5] Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
    Liu, An-An
    Xu, Ning
    Zhang, Hanwang
    Nie, Weizhi
    Su, Yuting
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 821 - 827
  • [6] Crop Disease Diagnosis with Deep Learning-Based Image Captioning and Object Detection
    Lee, Dong In
    Lee, Ji Hwan
    Jang, Seung Ho
    Oh, Se Jong
    Doo, Ill Chul
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [7] Reward Mechanism Design for Deep Reinforcement Learning-Based Microgrid Energy Management
    Hu, Mingjie
    Han, Baohui
    Lv, Shilin
    Bao, Zhejing
    Lu, Lingxia
    Yu, Miao
    [J]. 2023 6TH INTERNATIONAL CONFERENCE ON RENEWABLE ENERGY AND POWER ENGINEERING, REPE 2023, 2023, : 201 - 205
  • [8] Deep Metric Learning-Based Feature Embedding for Hyperspectral Image Classification
    Deng, Bin
    Jia, Sen
    Shi, Daming
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (02): : 1422 - 1435
  • [9] Automatic image captioning in Thai for house defect using a deep learning-based approach
    Manadda Jaruschaimongkol
    Krittin Satirapiwong
    Kittipan Pipatsattayanuwong
    Suwant Temviriyakul
    Ratchanat Sangprasert
    Thitirat Siriborvornratanakul
    [J]. Advances in Computational Intelligence, 2024, 4 (1):
  • [10] Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions
    Wang, Xusheng
    Xie, Jiexin
    Guo, Shijie
    Li, Yue
    Sun, Pengfei
    Gan, Zhongxue
    [J]. ADVANCES IN MECHANICAL ENGINEERING, 2021, 13 (12)