Mind's Eye: A Recurrent Visual Representation for Image Caption Generation

被引:0
|
作者
Chen, Xinlei [1 ]
Zitnick, C. Lawrence [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
引用
收藏
页码:2422 / 2431
页数:10
相关论文
共 50 条
  • [31] The eye's mind: Literary modernism and visual culture
    Hale, DJ
    [J]. MODERNISM-MODERNITY, 2001, 8 (03) : 538 - 540
  • [32] The Eye's mind - Visual imagination, neuroscience and the humanities
    Zeman, Adam
    MacKisack, Matthew
    Onians, John
    [J]. CORTEX, 2018, 105 : 1 - 3
  • [33] The eye's mind: Literary modernism and visual culture
    Arnold, WN
    [J]. LEONARDO, 2002, 35 (01) : 102 - 102
  • [34] Visual mental imagery: What the head's eye tells the mind's eye
    Bourlon, Clemence
    Oliviero, Bastien
    Wattiez, Nicolas
    Pouget, Pierre
    Bartolomeo, Paolo
    [J]. BRAIN RESEARCH, 2011, 1367 : 287 - 297
  • [35] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [36] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    [J]. 2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [37] Image Caption Generation with Part of Speech Guidance
    He, Xinwei
    Shi, Baoguang
    Bai, Xiang
    Xia, Gui-Song
    Zhang, Zhaoxiang
    Dong, Weisheng
    [J]. PATTERN RECOGNITION LETTERS, 2019, 119 : 229 - 237
  • [38] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [39] Cross-Lingual Image Caption Generation
    Miyazaki, Takashi
    Shimizu, Nobuyuki
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1780 - 1790
  • [40] Entity-aware Image Caption Generation
    Lu, Di
    Whitehead, Spencer
    Huang, Lifu
    Ji, Heng
    Chang, Shih-Fu
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4013 - 4023