Deep Learning Approaches on Image Captioning: A Review

被引:14
|
作者
Ghandi, Taraneh [1 ]
Pourreza, Hamidreza [2 ]
Mahyar, Hamidreza [1 ]
机构
[1] McMaster Univ, 1280 Main St West, Hamilton, ON L8S 4L8, Canada
[2] Ferdowsi Univ Mashhad, Azadi Sq, Mashhad 9177948974, Razavi Khorasan, Iran
关键词
Image captioning; deep learning; text generation; LANGUAGE;
D O I
10.1145/3617592
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Image captioning is a research area of immense importance, aiming to generate natural language descriptions for visual content in the form of still images. The advent of deep learning and more recently vision-language pre-training techniques has revolutionized the field, leading to more sophisticated methods and improved performance. In this survey article, we provide a structured review of deep learning methods in image captioning by presenting a comprehensive taxonomy and discussing each method category in detail. Additionally, we examine the datasets commonly employed in image captioning research, as well as the evaluation metrics used to assess the performance of different captioning models. We address the challenges faced in this field by emphasizing issues such as object hallucination, missing context, illumination conditions, contextual understanding, and referring expressions. We rank different deep learning methods' performance according to widely used evaluation metrics, giving insight into the current state-of-the-art. Furthermore, we identify several potential future directions for research in this area, which include tackling the information misalignment problem between image and text modalities, mitigating dataset bias, incorporating vision-language pre-training methods to enhance caption generation, and developing improved evaluation tools to accurately measure the quality of image captions.
引用
收藏
页数:37
相关论文
共 50 条
  • [31] Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (02)
  • [32] Automatic Bangla Image Captioning Based on Transformer Model in Deep Learning
    Hossain, Md Anwar
    Hasan, Mirza A. F. M. Rashidul
    Hossen, Ebrahim
    Asraful, Md
    Faruk, Md Omar
    Abadin, A. F. M. Zainul
    Ali, Md Suhag
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1110 - 1117
  • [33] Double awareness mechanism based deep learning framework for image captioning
    Gaurav
    Mathur, Pratistha
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2023, 26 (06): : 1801 - 1817
  • [34] A reference-based model using deep learning for image captioning
    Nogueira, Tiago do Carmo
    Noronha Vinhal, Cassio Dener
    da Cruz, Gelson, Jr.
    Diedrich Ullmann, Matheus Rudolfo
    Marques, Thyago Carvalho
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1665 - 1681
  • [35] Enhanced Image Captioning with Color Recognition Using Deep Learning Methods
    Chang, Yeong-Hwa
    Chen, Yen-Jen
    Huang, Ren-Hung
    Yu, Yi-Ting
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [36] Deep Reinforcement Learning-based Image Captioning with Embedding Reward
    Ren, Zhou
    Wang, Xiaoyu
    Zhang, Ning
    Lv, Xutao
    Li, Li-Jia
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1151 - 1159
  • [37] Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning
    Omri, Mohamed
    Abdel-Khalek, Sayed
    Khalil, Eied M.
    Bouslimi, Jamel
    Joshi, Gyanendra Prasad
    [J]. MATHEMATICS, 2022, 10 (03)
  • [38] Towards Unified Deep Learning Model for NSFW Image and Video Captioning
    Ko, Jong-Won
    Hwang, Dong-Hyun
    [J]. ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING, MUE/FUTURETECH 2018, 2019, 518 : 57 - 63
  • [39] Metaheuristics Optimization with Deep Learning Enabled Automated Image Captioning System
    Al Duhayyim, Mesfer
    Alazwari, Sana
    Mengash, Hanan Abdullah
    Marzouk, Radwa
    Alzahrani, Jaber S.
    Mahgoub, Hany
    Althukair, Fahd
    Salama, Ahmed S.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [40] Contrastive Learning for Image Captioning
    Dai, Bo
    Lin, Dahua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30