Automatic Caption Generation for News Images

被引:61
|
作者
Feng, Yansong [1 ]
Lapata, Mirella [2 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128 Zhong Guan Cun N St, Beijing 100871, Peoples R China
[2] Univ Edinburgh, Informat Forum, Inst Language Cognit & Computat, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
关键词
Caption generation; image annotation; summarization; topic models; NATURAL-LANGUAGE;
D O I
10.1109/TPAMI.2012.118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired individuals to access pictorial information. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned and colocated with thematically related documents. Our model learns to create captions from a database of news articles, the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. The model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). Inspired by recent work in summarization, we propose extractive and abstractive surface realization models. Experimental results show that it is viable to generate captions that are pertinent to the specific content of an image and its associated article, while permitting creativity in the description. Indeed, the output of our abstractive model compares favorably to handwritten captions and is often superior to extractive methods.
引用
收藏
页码:797 / 812
页数:16
相关论文
共 50 条
  • [41] Review Networks for Caption Generation
    Yang, Zhilin
    Yuan, Ye
    Wu, Yuexin
    Salakhutdinov, Ruslan
    Cohen, William W.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [42] Caption Detection and Text Recognition in News Video
    Yang, Zhe
    Shi, Ping
    2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 188 - 191
  • [43] Automatic caption localization in compressed video
    Zhong, Y
    Zhang, HJ
    Jain, AK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (04) : 385 - 392
  • [44] CNN image caption generation
    Li Y.
    Cheng H.
    Liang X.
    Guo Q.
    Qian Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
  • [45] A Method of Caption Location and Segmentation in News Video
    Huang, He
    Shi, Ping
    Yang, Laiwen
    2014 7TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP 2014), 2014, : 365 - 369
  • [46] A Method of Automatic Memorabilia Generation Based on News Reports
    Sun Rui
    Zhang Hongyi
    Zhang Benkang
    Zhao Hanyan
    Tang Renbei
    CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 496 - 506
  • [47] Design and implementation of automatic generation system of virtual news
    Chen, Dan-Wen
    Xu, Jian-Jun
    Xie, Yu-Xiang
    Wu, Ling-Da
    Xitong Fangzhen Xuebao / Journal of System Simulation, 2006, 18 (SUPPL.): : 157 - 160
  • [48] Generative Caption for Diabetic Retinopathy Images
    Wu, Luhui
    Wan, Cheng
    Wu, Yiquan
    Liu, Jiang
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 515 - 519
  • [49] Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction
    El-gayar, M. M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 105 - 114
  • [50] Fast Caption Alignment for Automatic Indexing of Audio
    Knight, Allan
    Almeroth, Kevin
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2010, 1 (02): : 1 - 17