Image caption generation using transformer learning methods: a case study on instagram image

被引:0
|
作者
Kwankamon Dittakan
Kamontorn Prompitak
Phutphisit Thungklang
Chatchawan Wongwattanakit
机构
[1] Prince of Songkla University,College of Computing and Faculty of Hospitality and Tourism
[2] Phuket Campus,undefined
来源
关键词
Image Captioning; Transformer Learning Model; Self-Attention Mechanism; Encoder-Decoder; Image feature extraction; Instagram image;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, images are being used more extensively for communication purposes. A single image can convey a variety of stories, depending on the perspective and thoughts of everyone who views it. To facilitate comprehension, inclusion image captions is highly beneficial, especially for individuals with visual impairments who can read Braille or rely on audio descriptions. The purpose of this research is to create an automatic captioning system that is easy to understand and quick to generate. This system can be applied to other related systems. In this research, the transformer learning process is applied to image captioning instead of the convolutional neural networks (CNN) and recurrent neural networks (RNN) process which has limitations in processing long-sequence data and managing data complexity. The transformer learning process can handle these limitations well and more efficiently. Additionally, the image captioning system was trained on a dataset of 5,000 images from Instagram that were tagged with the hashtag "Phuket" (#Phuket). The researchers also wrote the captions themselves to use as a dataset for testing the image captioning system. The experiments showed that the transformer learning process can generate natural captions that are close to human language. The generated captions will also be evaluated using the Bilingual Evaluation Understudy (BLEU) score and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score, a metric for measuring the similarity between machine-translated text and human-written text. This will allow us to compare the resemblance between the researcher-written captions and the transformer-generated captions.
引用
收藏
页码:46397 / 46417
页数:20
相关论文
共 50 条
  • [21] Learning cross-modality features for image caption generation
    Zeng, Chao
    Kwong, Sam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (07) : 2059 - 2070
  • [22] Sentence Learning on Deep Convolutional Networks for Image Caption Generation
    Kim, Dong-Jin
    Yoo, Donggeun
    Sim, Bonggeun
    Kweon, In So
    2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 246 - 247
  • [23] Ensemble Learning on Deep Neural Networks for Image Caption Generation
    Katpally, Harshitha
    Bansal, Ajay
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 61 - 68
  • [24] Learning cross-modality features for image caption generation
    Chao Zeng
    Sam Kwong
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 2059 - 2070
  • [25] DIC-Transformer: interpretation of plant disease classification results using image caption generation technology
    Zeng, Qingtian
    Sun, Jian
    Wang, Shansong
    FRONTIERS IN PLANT SCIENCE, 2024, 14
  • [26] The Accurate Guidance for Image Caption Generation
    Qi, Xinyuan
    Cao, Zhiguo
    Xiao, Yang
    Wang, Jian
    Zhang, Chao
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 15 - 26
  • [27] A survey on automatic image caption generation
    Bai, Shuang
    An, Shan
    NEUROCOMPUTING, 2018, 311 : 291 - 304
  • [28] Hybrid explainable image caption generation using image processing and natural language processing
    Mishra, Atul
    Agrawal, Anubhav
    Bhasker, Shailendra
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 4874 - 4884
  • [29] Image caption generation with high-level image features
    Ding, Songtao
    Qu, Shiru
    Xi, Yuling
    Sangaiah, Arun Kumar
    Wan, Shaohua
    PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95
  • [30] Data Augmentation to Stabilize Image Caption Generation Models in Deep Learning
    Aldabbas, Hamza
    Asad, Muhammad
    Ryalat, Mohammad Hashem
    Malik, Kaleem Razzaq
    Qureshi, Muhammad Zubair Akbar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (10) : 571 - 579