A survey on automatic image caption generation

被引：120

作者：

Bai, Shuang ^{[1
]}

An, Shan ^{[2
]}

机构：

[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, 3 Shang Yuan Cun, Beijing, Peoples R China

[2] Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China

来源：

NEUROCOMPUTING | 2018年 / 311卷

基金：

中国国家自然科学基金;

关键词：

Image captioning; Sentence template; Deep neural networks; Multimodal embedding; Encoder-decoder framework; Attention mechanism; NEURAL-NETWORKS; DEEP; REPRESENTATION; SCENE;

D O I：

10.1016/j.neucom.2018.05.080

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：291 / 304

页数：14

共 50 条

[31] Automatic Surgical Caption Generation in Nephrectomy Surgery Videos
Kutuk, Sevdenur
Bombieri, Marco
Dall'Alba, Diego
Fiorini, Paolo
Sarikaya, Duygu
2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
[32] Image caption generation with high-level image features
Ding, Songtao
Qu, Shiru
Xi, Yuling
Sangaiah, Arun Kumar
Wan, Shaohua
PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95
[33] A Systematic Survey of Automatic Image Description Generation Systems
Sreela, S. R.
Idicula, Sumam Mary
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
[34] CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation
Rahman, Rashik
Saha, Aloke Kumar
Murad, Hasan
Al Masud, Shah Murtaza Rashid
Rahman, Nakiba Nuren
Momtaz, A. S. Zaforullah
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 752 - 759
[35] A novel automatic image caption generation using bidirectional long-short term memory framework
Ye, Zhongfu
Khan, Rashid
Naqvi, Nuzhat
Islam, M. Shujah
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25557 - 25582
[36] A novel automatic image caption generation using bidirectional long-short term memory framework
Zhongfu Ye
Rashid Khan
Nuzhat Naqvi
M. Shujah Islam
Multimedia Tools and Applications, 2021, 80 : 25557 - 25582
[37] Transformer based image caption generation for news articles
Pande, Ashtavinayak
Pandey, Atul
Solanki, Ayush
Shanbhag, Chinmay
Motghare, Manish
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
[38] Bahdanau Attention Based Bengali Image Caption Generation
Alam, Md Sahrial
Rahman, Md Sayedur
Hosen, Md Ikbal
Mubin, Khairul Anam
Hossen, Sharif
Mridha, M. F.
2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
[39] Image Caption Generation with Local Semantic and Global Information
Liu, Xing
Liu, Weibin
Xing, Weiwei
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 680 - 685
[40] Deep Neural Networks for Efficient Image Caption Generation
Rai, Riddhi
Guruprasad, Navya Shimoga
Tumuluru, Shreya Sindhu
ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 247 - 260

← 1 2 3 4 5 →