A survey on automatic image caption generation

被引:111
|
作者
Bai, Shuang [1 ]
An, Shan [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, 3 Shang Yuan Cun, Beijing, Peoples R China
[2] Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Sentence template; Deep neural networks; Multimodal embedding; Encoder-decoder framework; Attention mechanism; NEURAL-NETWORKS; DEEP; REPRESENTATION; SCENE;
D O I
10.1016/j.neucom.2018.05.080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:291 / 304
页数:14
相关论文
共 50 条
  • [21] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [22] Image Caption Generation with Part of Speech Guidance
    He, Xinwei
    Shi, Baoguang
    Bai, Xiang
    Xia, Gui-Song
    Zhang, Zhaoxiang
    Dong, Weisheng
    PATTERN RECOGNITION LETTERS, 2019, 119 : 229 - 237
  • [23] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [24] Cross-Lingual Image Caption Generation
    Miyazaki, Takashi
    Shimizu, Nobuyuki
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1780 - 1790
  • [25] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [26] Entity-aware Image Caption Generation
    Lu, Di
    Whitehead, Spencer
    Huang, Lifu
    Ji, Heng
    Chang, Shih-Fu
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4013 - 4023
  • [27] Topic-Based Image Caption Generation
    Dash, Sandeep Kumar
    Acharya, Shantanu
    Pakray, Partha
    Das, Ranjita
    Gelbukh, Alexander
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034
  • [28] Topic-Based Image Caption Generation
    Sandeep Kumar Dash
    Shantanu Acharya
    Partha Pakray
    Ranjita Das
    Alexander Gelbukh
    Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
  • [29] Topic-Specific Image Caption Generation
    Zhou, Chang
    Mao, Yuzhao
    Wang, Xiaojie
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 321 - 332
  • [30] Image caption generation with high-level image features
    Ding, Songtao
    Qu, Shiru
    Xi, Yuling
    Sangaiah, Arun Kumar
    Wan, Shaohua
    PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95