Reference-based model using multimodal gated recurrent units for image captioning

被引:17
|
作者
Nogueira, Tiago do Carmo [1 ]
Vinhal, Cassio Dener Noronha [1 ]
da Cruz Junior, Gelson [1 ]
Ullmann, Matheus Rudolfo Diedrich [1 ]
机构
[1] Fed Univ Goias UFG, Sch Elect Mech & Comp Engn EMC, Goiania, Go, Brazil
关键词
Gated recurrent units; Caption generation references; Convolutional neural network; ATTENTION; LSTM; NETWORK;
D O I
10.1007/s11042-020-09539-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Describing images through natural language is a challenging task in the field of computer vision. Image captioning consists of creating image descriptions that can be accomplished via deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, traditional RNNs encounter problems such as exploding and vanishing gradients, and they exhibit poor performance when generating non-descriptive sentences. To solve these issues, we proposed a model based on the encoder-decoder structure using CNNs to extract the image features and multimodal gated recurrent units (GRU) for descriptions. This model implements the part-of-speech (PoS) and likelihood function for weight generation in the GRU. The method performs knowledge transfer during a validation phase that uses the k-nearest neighbors technique (kNN). Experimental results using the Flickr30k and MSCOCO datasets demonstrated that the proposed PoS-based model presents competitive scores in comparison to state-of-the-art models. The system predicts more descriptive captions and closely approximates the expected captions both in the predicted andkNN selected captions.
引用
收藏
页码:30615 / 30635
页数:21
相关论文
共 50 条
  • [1] Reference-based model using multimodal gated recurrent units for image captioning
    Tiago do Carmo Nogueira
    Cássio Dener Noronha Vinhal
    Gélson da Cruz Júnior
    Matheus Rudolfo Diedrich Ullmann
    [J]. Multimedia Tools and Applications, 2020, 79 : 30615 - 30635
  • [2] A reference-based model using deep learning for image captioning
    Tiago do Carmo Nogueira
    Cássio Dener Noronha Vinhal
    Gélson da Cruz Júnior
    Matheus Rudolfo Diedrich Ullmann
    Thyago Carvalho Marques
    [J]. Multimedia Systems, 2023, 29 : 1665 - 1681
  • [3] A reference-based model using deep learning for image captioning
    Nogueira, Tiago do Carmo
    Noronha Vinhal, Cassio Dener
    da Cruz, Gelson, Jr.
    Diedrich Ullmann, Matheus Rudolfo
    Marques, Thyago Carvalho
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1665 - 1681
  • [4] Rethinking the Reference-based Distinctive Image Captioning
    Mao, Yangjun
    Chen, Long
    Jiang, Zhihong
    Zhang, Dong
    Zhang, Zhimeng
    Shao, Jian
    Xiao, Jun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4374 - 4384
  • [5] Improving Reference-Based Distinctive Image Captioning with Contrastive Rewards
    Mao, Yangjun
    Xiao, Jun
    Zhang, Dong
    Cao, Meng
    Shao, Jian
    Zhuang, Yueting
    Chen, Long
    [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (12)
  • [6] Recurrent Neural Network for Content Based Image Retrieval Using Image Captioning Model
    Sindu, S.
    Kousalya, R.
    [J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 1067 - 1077
  • [7] Gated Recurrent Units and Recurrent Neural Network Based Multimodal Approach for Automatic Video Summarization
    Kaur, Lakhwinder
    Aljrees, Turki
    Kumar, Ankit
    Pandey, Saroj Kumar
    Singh, Kamred Udham
    Mishra, Pankaj Kumar
    Singh, Teekam
    [J]. TRAITEMENT DU SIGNAL, 2023, 40 (03) : 1227 - 1234
  • [8] Reference Based LSTM for Image Captioning
    Chen, Minghai
    Ding, Guiguang
    Zhao, Sicheng
    Chen, Hui
    Han, Jungong
    Liu, Qiang
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3981 - 3987
  • [9] Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM
    Yang, Yo
    Chen, Lizhi
    Pan, Longyue
    Hu, Juntao
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (10): : 3390 - 3405
  • [10] Automated Image Captioning with Multi-layer Gated Recurrent Unit
    Moral, Ozge Taylan
    Kilic, Volkan
    Onan, Aytug
    Wang, Wenwu
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1160 - 1164