Reference-based model using multimodal gated recurrent units for image captioning

被引：17

作者：

Nogueira, Tiago do Carmo ^{[1
]}

Vinhal, Cassio Dener Noronha ^{[1
]}

da Cruz Junior, Gelson ^{[1
]}

Ullmann, Matheus Rudolfo Diedrich ^{[1
]}

机构：

[1] Fed Univ Goias UFG, Sch Elect Mech & Comp Engn EMC, Goiania, Go, Brazil

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2020年 / 79卷 / 41-42期

关键词：

Gated recurrent units; Caption generation references; Convolutional neural network; ATTENTION; LSTM; NETWORK;

D O I：

10.1007/s11042-020-09539-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Describing images through natural language is a challenging task in the field of computer vision. Image captioning consists of creating image descriptions that can be accomplished via deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, traditional RNNs encounter problems such as exploding and vanishing gradients, and they exhibit poor performance when generating non-descriptive sentences. To solve these issues, we proposed a model based on the encoder-decoder structure using CNNs to extract the image features and multimodal gated recurrent units (GRU) for descriptions. This model implements the part-of-speech (PoS) and likelihood function for weight generation in the GRU. The method performs knowledge transfer during a validation phase that uses the k-nearest neighbors technique (kNN). Experimental results using the Flickr30k and MSCOCO datasets demonstrated that the proposed PoS-based model presents competitive scores in comparison to state-of-the-art models. The system predicts more descriptive captions and closely approximates the expected captions both in the predicted andkNN selected captions.

引用

页码：30615 / 30635

页数：21

共 50 条

[1] Reference-based model using multimodal gated recurrent units for image captioning
Tiago do Carmo Nogueira
Cássio Dener Noronha Vinhal
Gélson da Cruz Júnior
Matheus Rudolfo Diedrich Ullmann
[J]. Multimedia Tools and Applications, 2020, 79 : 30615 - 30635
[2] A reference-based model using deep learning for image captioning
Tiago do Carmo Nogueira
Cássio Dener Noronha Vinhal
Gélson da Cruz Júnior
Matheus Rudolfo Diedrich Ullmann
Thyago Carvalho Marques
[J]. Multimedia Systems, 2023, 29 : 1665 - 1681
[3] A reference-based model using deep learning for image captioning
Nogueira, Tiago do Carmo
Noronha Vinhal, Cassio Dener
da Cruz, Gelson, Jr.
Diedrich Ullmann, Matheus Rudolfo
Marques, Thyago Carvalho
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1665 - 1681
[4] Rethinking the Reference-based Distinctive Image Captioning
Mao, Yangjun
Chen, Long
Jiang, Zhihong
Zhang, Dong
Zhang, Zhimeng
Shao, Jian
Xiao, Jun
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4374 - 4384
[5] Improving Reference-Based Distinctive Image Captioning with Contrastive Rewards
Mao, Yangjun
Xiao, Jun
Zhang, Dong
Cao, Meng
Shao, Jian
Zhuang, Yueting
Chen, Long
[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 20 (12)
[6] Recurrent Neural Network for Content Based Image Retrieval Using Image Captioning Model
Sindu, S.
Kousalya, R.
[J]. COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 1067 - 1077
[7] Gated Recurrent Units and Recurrent Neural Network Based Multimodal Approach for Automatic Video Summarization
Kaur, Lakhwinder
Aljrees, Turki
Kumar, Ankit
Pandey, Saroj Kumar
Singh, Kamred Udham
Mishra, Pankaj Kumar
Singh, Teekam
[J]. TRAITEMENT DU SIGNAL, 2023, 40 (03) : 1227 - 1234
[8] Reference Based LSTM for Image Captioning
Chen, Minghai
Ding, Guiguang
Zhao, Sicheng
Chen, Hui
Han, Jungong
Liu, Qiang
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3981 - 3987
[9] Image Captioning with Synergy-Gated Attention and Recurrent Fusion LSTM
Yang, Yo
Chen, Lizhi
Pan, Longyue
Hu, Juntao
[J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (10): : 3390 - 3405
[10] Automated Image Captioning with Multi-layer Gated Recurrent Unit
Moral, Ozge Taylan
Kilic, Volkan
Onan, Aytug
Wang, Wenwu
[J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1160 - 1164

← 1 2 3 4 5 →