Image caption generation using transformer learning methods: a case study on instagram image

被引:0
|
作者
Kwankamon Dittakan
Kamontorn Prompitak
Phutphisit Thungklang
Chatchawan Wongwattanakit
机构
[1] Prince of Songkla University,College of Computing and Faculty of Hospitality and Tourism
[2] Phuket Campus,undefined
来源
关键词
Image Captioning; Transformer Learning Model; Self-Attention Mechanism; Encoder-Decoder; Image feature extraction; Instagram image;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, images are being used more extensively for communication purposes. A single image can convey a variety of stories, depending on the perspective and thoughts of everyone who views it. To facilitate comprehension, inclusion image captions is highly beneficial, especially for individuals with visual impairments who can read Braille or rely on audio descriptions. The purpose of this research is to create an automatic captioning system that is easy to understand and quick to generate. This system can be applied to other related systems. In this research, the transformer learning process is applied to image captioning instead of the convolutional neural networks (CNN) and recurrent neural networks (RNN) process which has limitations in processing long-sequence data and managing data complexity. The transformer learning process can handle these limitations well and more efficiently. Additionally, the image captioning system was trained on a dataset of 5,000 images from Instagram that were tagged with the hashtag "Phuket" (#Phuket). The researchers also wrote the captions themselves to use as a dataset for testing the image captioning system. The experiments showed that the transformer learning process can generate natural captions that are close to human language. The generated captions will also be evaluated using the Bilingual Evaluation Understudy (BLEU) score and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score, a metric for measuring the similarity between machine-translated text and human-written text. This will allow us to compare the resemblance between the researcher-written captions and the transformer-generated captions.
引用
收藏
页码:46397 / 46417
页数:20
相关论文
共 50 条
  • [1] Image caption generation using transformer learning methods: a case study on instagram image
    Dittakan, Kwankamon
    Prompitak, Kamontorn
    Thungklang, Phutphisit
    Wongwattanakit, Chatchawan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 46397 - 46417
  • [2] Image Caption Generation With Adaptive Transformer
    Zhang, Wei
    Nie, Wenbo
    Li, Xinle
    Yu, Yao
    2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 521 - 526
  • [3] Remote sensing image caption generation via transformer and reinforcement learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
  • [4] Remote sensing image caption generation via transformer and reinforcement learning
    Xiangqing Shen
    Bing Liu
    Yong Zhou
    Jiaqi Zhao
    Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
  • [5] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [6] Image Caption Generation using Deep Learning Technique
    Amritkar, Chetan
    Jabade, Vaishali
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [7] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [8] An Overview of Image Caption Generation Methods
    Wang, Haoran
    Zhang, Yue
    Yu, Xiaosheng
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [9] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [10] A transformer-based Urdu image caption generation
    Hadi M.
    Safder I.
    Waheed H.
    Zaman F.
    Aljohani N.R.
    Nawaz R.
    Hassan S.U.
    Sarwar R.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (9) : 3441 - 3457