A survey on automatic image caption generation

被引:111
|
作者
Bai, Shuang [1 ]
An, Shan [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, 3 Shang Yuan Cun, Beijing, Peoples R China
[2] Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Sentence template; Deep neural networks; Multimodal embedding; Encoder-decoder framework; Attention mechanism; NEURAL-NETWORKS; DEEP; REPRESENTATION; SCENE;
D O I
10.1016/j.neucom.2018.05.080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:291 / 304
页数:14
相关论文
共 50 条
  • [31] Automatic Surgical Caption Generation in Nephrectomy Surgery Videos
    Kutuk, Sevdenur
    Bombieri, Marco
    Dall'Alba, Diego
    Fiorini, Paolo
    Sarikaya, Duygu
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [32] A Systematic Survey of Automatic Image Description Generation Systems
    Sreela, S. R.
    Idicula, Sumam Mary
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
  • [33] CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation
    Rahman, Rashik
    Saha, Aloke Kumar
    Murad, Hasan
    Al Masud, Shah Murtaza Rashid
    Rahman, Nakiba Nuren
    Momtaz, A. S. Zaforullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 752 - 759
  • [34] A novel automatic image caption generation using bidirectional long-short term memory framework
    Ye, Zhongfu
    Khan, Rashid
    Naqvi, Nuzhat
    Islam, M. Shujah
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25557 - 25582
  • [35] A novel automatic image caption generation using bidirectional long-short term memory framework
    Zhongfu Ye
    Rashid Khan
    Nuzhat Naqvi
    M. Shujah Islam
    Multimedia Tools and Applications, 2021, 80 : 25557 - 25582
  • [36] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [37] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [38] Deep Neural Networks for Efficient Image Caption Generation
    Rai, Riddhi
    Guruprasad, Navya Shimoga
    Tumuluru, Shreya Sindhu
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 247 - 260
  • [39] Image Caption Generation with Local Semantic and Global Information
    Liu, Xing
    Liu, Weibin
    Xing, Weiwei
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 680 - 685
  • [40] 3G structure for image caption generation
    Yuan, Aihong
    Li, Xuelong
    Lu, Xiaoqiang
    NEUROCOMPUTING, 2019, 330 : 17 - 28