A survey on automatic image caption generation

被引:111
|
作者
Bai, Shuang [1 ]
An, Shan [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, 3 Shang Yuan Cun, Beijing, Peoples R China
[2] Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Sentence template; Deep neural networks; Multimodal embedding; Encoder-decoder framework; Attention mechanism; NEURAL-NETWORKS; DEEP; REPRESENTATION; SCENE;
D O I
10.1016/j.neucom.2018.05.080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:291 / 304
页数:14
相关论文
共 50 条
  • [1] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [2] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [3] Image Caption Automatic Generation Method Based on Weighted Feature
    Xi, Su Mei
    Cho, Young Im
    2013 13TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2013), 2013, : 548 - 551
  • [4] Automatic Image Caption Generation Using ResNet & Torch Vision
    Verma, Vijeta
    Saritha, Sri Khetwat
    Jain, Sweta
    MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
  • [5] Automatic image caption generation using deep learning and multimodal attention
    Dai, Jin
    Zhang, Xinyu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [6] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [7] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [8] GVA: guided visual attention approach for automatic image caption generation
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Md. Imran
    MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [9] GVA: guided visual attention approach for automatic image caption generation
    Md. Bipul Hossen
    Zhongfu Ye
    Amr Abdussalam
    Md. Imran Hossain
    Multimedia Systems, 2024, 30
  • [10] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +