Mind's Eye: A Recurrent Visual Representation for Image Caption Generation

被引:0
|
作者
Chen, Xinlei [1 ]
Zitnick, C. Lawrence [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
引用
收藏
页码:2422 / 2431
页数:10
相关论文
共 50 条
  • [1] Cascade recurrent neural network for image caption generation
    Wu, Jie
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2017, 53 (25) : 1642 - 1643
  • [2] Recurrent Attention LSTM Model for Image Chinese Caption Generation
    Zhang, Chaoying
    Dai, Yaping
    Cheng, Yanyan
    Jia, Zhiyang
    Hirota, Kaoru
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 808 - 813
  • [3] Visual Image Caption Generation for Service Robotics and Industrial Applications
    Luo, Ren C.
    Hsu, Yu-Ting
    Wen, Yu-Cheng
    Ye, Huan-Jun
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER PHYSICAL SYSTEMS (ICPS 2019), 2019, : 827 - 832
  • [4] VSAM-Based Visual Keyword Generation for Image Caption
    Zhang, Suya
    Zhang, Yana
    Chen, Zeyu
    Li, Zhaohui
    [J]. IEEE ACCESS, 2021, 9 : 27638 - 27649
  • [5] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
    Khademi, Mahmoud
    Schulte, Oliver
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
  • [6] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    Xu, Kelvin
    Ba, Jimmy Lei
    Kiros, Ryan
    Cho, Kyunghyun
    Courville, Aaron
    Salakhutdinov, Ruslan
    Zemel, Richard S.
    Bengio, Yoshua
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057
  • [7] Clothes image caption generation with attribute detection and visual attention model
    Li, Xianrui
    Ye, Zhiling
    Zhang, Zhao
    Zhao, Mingbo
    [J]. PATTERN RECOGNITION LETTERS, 2021, 141 : 68 - 74
  • [8] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [9] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
  • [10] GVA: guided visual attention approach for automatic image caption generation
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Md. Imran
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)