A reference-based model using deep learning for image captioning

被引：6

作者：

Nogueira, Tiago do Carmo ^{[1
]}

Noronha Vinhal, Cassio Dener ^{[2
]}

da Cruz, Gelson, Jr. ^{[2
]}

Diedrich Ullmann, Matheus Rudolfo ^{[3
]}

Marques, Thyago Carvalho ^{[2
]}

机构：

[1] Fed Inst Baiano IFBaiano, Bom Jesus Da Lapa, Brazil

[2] Fed Univ Goias UFG, Sch Elect Mech & Comp Engn EMC, Goiania, Go, Brazil

[3] Fed Inst Bahia IFBA, Barreiras, Brazil

来源：

MULTIMEDIA SYSTEMS | 2023年 / 29卷 / 03期

关键词：

Gated recurrent units; Caption generation references; Convolutional neural network; Part-of-speech tagging; ATTENTION; LSTM; NETWORK;

D O I：

10.1007/s00530-022-00937-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Describing images in natural language is a challenging task for computer vision. Image captioning is the task of creating image descriptions. Deep learning architectures that use convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are beneficial in this task. However, traditional RNNs may cause problems, including exploding gradients, vanishing gradients, and non-descriptive sentences. To solve these problems, we propose a model based on the encoder-decoder structure, using CNNs to extract features from reference images and gated recurrent units (GRUs) to create the descriptions. Our model applies part-of-speech (PoS) analysis and the likelihood function to generate weights in GRU. This method also performs the knowledge transfer during a validation phase using the k-nearest neighbors (kNN) technique. Our experimental results using Flickr30k and MS-COCO datasets indicate that the proposed PoS-based model yields competitive scores compared to those of high-end models. The system predicts more descriptive captions and closely approximates the expected captions for both the predicted and kNN-selected captions.

引用

页码：1665 / 1681

页数：17

共 50 条

[1] A reference-based model using deep learning for image captioning
Tiago do Carmo Nogueira
Cássio Dener Noronha Vinhal
Gélson da Cruz Júnior
Matheus Rudolfo Diedrich Ullmann
Thyago Carvalho Marques
[J]. Multimedia Systems, 2023, 29 : 1665 - 1681
[2] Reference-based model using multimodal gated recurrent units for image captioning
Nogueira, Tiago do Carmo
Vinhal, Cassio Dener Noronha
da Cruz Junior, Gelson
Ullmann, Matheus Rudolfo Diedrich
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 30615 - 30635
[3] Reference-based model using multimodal gated recurrent units for image captioning
Tiago do Carmo Nogueira
Cássio Dener Noronha Vinhal
Gélson da Cruz Júnior
Matheus Rudolfo Diedrich Ullmann
[J]. Multimedia Tools and Applications, 2020, 79 : 30615 - 30635
[4] Rethinking the Reference-based Distinctive Image Captioning
Mao, Yangjun
Chen, Long
Jiang, Zhihong
Zhang, Dong
Zhang, Zhimeng
Shao, Jian
Xiao, Jun
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4374 - 4384
[5] SAR Image Despeckling by Noisy Reference-Based Deep Learning Method
Ma, Xiaoshuang
Wang, Chen
Yin, Zhixiang
Wu, Penghai
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12): : 8807 - 8818
[6] Image Captioning using Deep Learning
Jain, Yukti Sanjay
Dhopeshwar, Tanisha
Chadha, Supreet Kaur
Pagire, Vrushali
[J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
[7] Image Captioning Using Deep Learning
Adithya, Paluvayi Veera
Kalidindi, Mourya Viswanadh
Swaroop, Nallani Jyothi
Vishwas, H. N.
[J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
[8] Automatic Bangla Image Captioning Based on Transformer Model in Deep Learning
Hossain, Md Anwar
Hasan, Mirza A. F. M. Rashidul
Hossen, Ebrahim
Asraful, Md
Faruk, Md Omar
Abadin, A. F. M. Zainul
Ali, Md Suhag
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 1110 - 1117
[9] Image and Video Captioning for Apparels Using Deep Learning
Agarwal, Govind
Jindal, Kritika
Chowdhury, Abishi
Singh, Vishal K.
Pal, Amrit
[J]. IEEE ACCESS, 2024, 12 : 113138 - 113150
[10] Generative image captioning in Urdu using deep learning
Afzal M.K.
Shardlow M.
Tuarob S.
Zaman F.
Sarwar R.
Ali M.
Aljohani N.R.
Lytras M.D.
Nawaz R.
Hassan S.-U.
[J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7719 - 7731

← 1 2 3 4 5 →