Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images

被引：1

作者：

Nursikuwagus, Agus ^{[1
]}

Munir, Rinaldi ^{[2
]}

Khodra, Masayu Leylia ^{[2
]}

机构：

[1] Inst Teknologi Bandung, Sch Elect Engn & Informat, Doctoral Program Informat, Jl Ganesha 10, Bandung 40132, Indonesia

[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Dept Informat, Jl Ganesha 10, Bandung 40132, Indonesia

来源：

JOURNAL OF IMAGING | 2022年 / 8卷 / 11期

关键词：

deep learning; vector embedding; convolutional neural network; recurrent neural network; SEMANTIC ATTENTION; NETWORK; MODEL;

D O I：

10.3390/jimaging8110294

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.

引用

页数：18

共 25 条

[1] Deep Reinforcement Learning-based Image Captioning with Embedding Reward
Ren, Zhou
Wang, Xiaoyu
Zhang, Ning
Lv, Xutao
Li, Li-Jia
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1151 - 1159
[2] Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
Honda, Ukyo
Ushiku, Yoshitaka
Hashimoto, Atsushi
Watanabe, Taro
Matsumoto, Yuji
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3692 - 3702
[3] AraCap: A hybrid deep learning architecture for Arabic Image Captioning
Afyouni, Imad
Azhar, Imtinan
Elnagar, Ashraf
[J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 382 - 389
[4] Image Captioning Using Detectors and Swarm Based Learning Approach for Word Embedding Vectors
Lalitha, B.
Gomathi, V
[J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 173 - 189
[5] Deep Learning for automatically describing images in natural language - Image Captioning
Hotaran, Anca Mihaela
Vrejoiu, Mihnea Horia
[J]. ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2020, 30 (01): : 87 - 100
[6] Generating Image Captions in Arabic Using Root-Word Based Recurrent Neural Networks and Deep Neural Networks
Jindal, Vasu
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8093 - 8094
[7] Image and audio caps: automated captioning of background sounds and images using deep learning
M. Poongodi
Mounir Hamdi
Huihui Wang
[J]. Multimedia Systems, 2023, 29 : 2951 - 2959
[8] Image and audio caps: automated captioning of background sounds and images using deep learning
Poongodi, M.
Hamdi, Mounir
Wang, Huihui
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2951 - 2959
[9] Advancing geological image segmentation: Deep learning approaches for rock type identification and classification
Gupta, Amit Kumar
Mathur, Priya
Sheth, Farhan
Travieso-Gonzalez, Carlos M.
Chaurasia, Sandeep
[J]. APPLIED COMPUTING AND GEOSCIENCES, 2024, 23
[10] Remote sensing image description based on word embedding and end-to-end deep learning
Wang, Yuan
Ma, Hongbing
Alifu, Kuerban
Lv, Yalong
[J]. SCIENTIFIC REPORTS, 2021, 11 (01)

← 1 2 3 →