A reference-based model using deep learning for image captioning

被引:6
|
作者
Nogueira, Tiago do Carmo [1 ]
Noronha Vinhal, Cassio Dener [2 ]
da Cruz, Gelson, Jr. [2 ]
Diedrich Ullmann, Matheus Rudolfo [3 ]
Marques, Thyago Carvalho [2 ]
机构
[1] Fed Inst Baiano IFBaiano, Bom Jesus Da Lapa, Brazil
[2] Fed Univ Goias UFG, Sch Elect Mech & Comp Engn EMC, Goiania, Go, Brazil
[3] Fed Inst Bahia IFBA, Barreiras, Brazil
关键词
Gated recurrent units; Caption generation references; Convolutional neural network; Part-of-speech tagging; ATTENTION; LSTM; NETWORK;
D O I
10.1007/s00530-022-00937-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Describing images in natural language is a challenging task for computer vision. Image captioning is the task of creating image descriptions. Deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are beneficial in this task. However, traditional RNNs may cause problems, including exploding gradients, vanishing gradients, and non-descriptive sentences. To solve these problems, we propose a model based on the encoder-decoder structure, using CNNs to extract features from reference images and gated recurrent units (GRUs) to create the descriptions. Our model applies part-of-speech (PoS) analysis and the likelihood function to generate weights in GRU. This method also performs the knowledge transfer during a validation phase using the k-nearest neighbors (kNN) technique. Our experimental results using Flickr30k and MS-COCO datasets indicate that the proposed PoS-based model yields competitive scores compared to those of high-end models. The system predicts more descriptive captions and closely approximates the expected captions for both the predicted and kNN-selected captions.
引用
收藏
页码:1665 / 1681
页数:17
相关论文
共 50 条
  • [1] A reference-based model using deep learning for image captioning
    Tiago do Carmo Nogueira
    Cássio Dener Noronha Vinhal
    Gélson da Cruz Júnior
    Matheus Rudolfo Diedrich Ullmann
    Thyago Carvalho Marques
    [J]. Multimedia Systems, 2023, 29 : 1665 - 1681
  • [2] Reference-based model using multimodal gated recurrent units for image captioning
    Nogueira, Tiago do Carmo
    Vinhal, Cassio Dener Noronha
    da Cruz Junior, Gelson
    Ullmann, Matheus Rudolfo Diedrich
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 30615 - 30635
  • [3] Reference-based model using multimodal gated recurrent units for image captioning
    Tiago do Carmo Nogueira
    Cássio Dener Noronha Vinhal
    Gélson da Cruz Júnior
    Matheus Rudolfo Diedrich Ullmann
    [J]. Multimedia Tools and Applications, 2020, 79 : 30615 - 30635
  • [4] Rethinking the Reference-based Distinctive Image Captioning
    Mao, Yangjun
    Chen, Long
    Jiang, Zhihong
    Zhang, Dong
    Zhang, Zhimeng
    Shao, Jian
    Xiao, Jun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4374 - 4384
  • [5] SAR Image Despeckling by Noisy Reference-Based Deep Learning Method
    Ma, Xiaoshuang
    Wang, Chen
    Yin, Zhixiang
    Wu, Penghai
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12): : 8807 - 8818
  • [6] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [7] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [8] Automatic Bangla Image Captioning Based on Transformer Model in Deep Learning
    Hossain, Md Anwar
    Hasan, Mirza A. F. M. Rashidul
    Hossen, Ebrahim
    Asraful, Md
    Faruk, Md Omar
    Abadin, A. F. M. Zainul
    Ali, Md Suhag
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1110 - 1117
  • [9] Image and Video Captioning for Apparels Using Deep Learning
    Agarwal, Govind
    Jindal, Kritika
    Chowdhury, Abishi
    Singh, Vishal K.
    Pal, Amrit
    [J]. IEEE ACCESS, 2024, 12 : 113138 - 113150
  • [10] Generative image captioning in Urdu using deep learning
    Afzal M.K.
    Shardlow M.
    Tuarob S.
    Zaman F.
    Sarwar R.
    Ali M.
    Aljohani N.R.
    Lytras M.D.
    Nawaz R.
    Hassan S.-U.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7719 - 7731