Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images

被引:1
|
作者
Nursikuwagus, Agus [1 ]
Munir, Rinaldi [2 ]
Khodra, Masayu Leylia [2 ]
机构
[1] Inst Teknologi Bandung, Sch Elect Engn & Informat, Doctoral Program Informat, Jl Ganesha 10, Bandung 40132, Indonesia
[2] Inst Teknol Bandung, Sch Elect Engn & Informat, Dept Informat, Jl Ganesha 10, Bandung 40132, Indonesia
关键词
deep learning; vector embedding; convolutional neural network; recurrent neural network; SEMANTIC ATTENTION; NETWORK; MODEL;
D O I
10.3390/jimaging8110294
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger.
引用
收藏
页数:18
相关论文
共 25 条
  • [1] Deep Reinforcement Learning-based Image Captioning with Embedding Reward
    Ren, Zhou
    Wang, Xiaoyu
    Zhang, Ning
    Lv, Xutao
    Li, Li-Jia
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1151 - 1159
  • [2] Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning
    Honda, Ukyo
    Ushiku, Yoshitaka
    Hashimoto, Atsushi
    Watanabe, Taro
    Matsumoto, Yuji
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3692 - 3702
  • [3] AraCap: A hybrid deep learning architecture for Arabic Image Captioning
    Afyouni, Imad
    Azhar, Imtinan
    Elnagar, Ashraf
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 382 - 389
  • [4] Image Captioning Using Detectors and Swarm Based Learning Approach for Word Embedding Vectors
    Lalitha, B.
    Gomathi, V
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 173 - 189
  • [5] Deep Learning for automatically describing images in natural language - Image Captioning
    Hotaran, Anca Mihaela
    Vrejoiu, Mihnea Horia
    [J]. ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2020, 30 (01): : 87 - 100
  • [6] Generating Image Captions in Arabic Using Root-Word Based Recurrent Neural Networks and Deep Neural Networks
    Jindal, Vasu
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8093 - 8094
  • [7] Image and audio caps: automated captioning of background sounds and images using deep learning
    M. Poongodi
    Mounir Hamdi
    Huihui Wang
    [J]. Multimedia Systems, 2023, 29 : 2951 - 2959
  • [8] Image and audio caps: automated captioning of background sounds and images using deep learning
    Poongodi, M.
    Hamdi, Mounir
    Wang, Huihui
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2951 - 2959
  • [9] Advancing geological image segmentation: Deep learning approaches for rock type identification and classification
    Gupta, Amit Kumar
    Mathur, Priya
    Sheth, Farhan
    Travieso-Gonzalez, Carlos M.
    Chaurasia, Sandeep
    [J]. APPLIED COMPUTING AND GEOSCIENCES, 2024, 23
  • [10] Remote sensing image description based on word embedding and end-to-end deep learning
    Wang, Yuan
    Ma, Hongbing
    Alifu, Kuerban
    Lv, Yalong
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)