Image caption generation using transformer learning methods: a case study on instagram image

被引：0

作者：

Kwankamon Dittakan

Kamontorn Prompitak

Phutphisit Thungklang

Chatchawan Wongwattanakit

机构：

[1] Prince of Songkla University,College of Computing and Faculty of Hospitality and Tourism

[2] Phuket Campus,undefined

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Image Captioning; Transformer Learning Model; Self-Attention Mechanism; Encoder-Decoder; Image feature extraction; Instagram image;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Nowadays, images are being used more extensively for communication purposes. A single image can convey a variety of stories, depending on the perspective and thoughts of everyone who views it. To facilitate comprehension, inclusion image captions is highly beneficial, especially for individuals with visual impairments who can read Braille or rely on audio descriptions. The purpose of this research is to create an automatic captioning system that is easy to understand and quick to generate. This system can be applied to other related systems. In this research, the transformer learning process is applied to image captioning instead of the convolutional neural networks (CNN) and recurrent neural networks (RNN) process which has limitations in processing long-sequence data and managing data complexity. The transformer learning process can handle these limitations well and more efficiently. Additionally, the image captioning system was trained on a dataset of 5,000 images from Instagram that were tagged with the hashtag "Phuket" (#Phuket). The researchers also wrote the captions themselves to use as a dataset for testing the image captioning system. The experiments showed that the transformer learning process can generate natural captions that are close to human language. The generated captions will also be evaluated using the Bilingual Evaluation Understudy (BLEU) score and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score, a metric for measuring the similarity between machine-translated text and human-written text. This will allow us to compare the resemblance between the researcher-written captions and the transformer-generated captions.

引用

页码：46397 / 46417

页数：20

共 50 条

[21] Learning cross-modality features for image caption generation
Zeng, Chao
Kwong, Sam
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (07) : 2059 - 2070
[22] Sentence Learning on Deep Convolutional Networks for Image Caption Generation
Kim, Dong-Jin
Yoo, Donggeun
Sim, Bonggeun
Kweon, In So
2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 246 - 247
[23] Ensemble Learning on Deep Neural Networks for Image Caption Generation
Katpally, Harshitha
Bansal, Ajay
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 61 - 68
[24] Learning cross-modality features for image caption generation
Chao Zeng
Sam Kwong
International Journal of Machine Learning and Cybernetics, 2022, 13 : 2059 - 2070
[25] DIC-Transformer: interpretation of plant disease classification results using image caption generation technology
Zeng, Qingtian
Sun, Jian
Wang, Shansong
FRONTIERS IN PLANT SCIENCE, 2024, 14
[26] The Accurate Guidance for Image Caption Generation
Qi, Xinyuan
Cao, Zhiguo
Xiao, Yang
Wang, Jian
Zhang, Chao
PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 15 - 26
[27] A survey on automatic image caption generation
Bai, Shuang
An, Shan
NEUROCOMPUTING, 2018, 311 : 291 - 304
[28] Hybrid explainable image caption generation using image processing and natural language processing
Mishra, Atul
Agrawal, Anubhav
Bhasker, Shailendra
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 4874 - 4884
[29] Image caption generation with high-level image features
Ding, Songtao
Qu, Shiru
Xi, Yuling
Sangaiah, Arun Kumar
Wan, Shaohua
PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95
[30] Data Augmentation to Stabilize Image Caption Generation Models in Deep Learning
Aldabbas, Hamza
Asad, Muhammad
Ryalat, Mohammad Hashem
Malik, Kaleem Razzaq
Qureshi, Muhammad Zubair Akbar
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (10) : 571 - 579

← 1 2 3 4 5 →