Image caption generation using transformer learning methods: a case study on instagram image

被引:0
|
作者
Kwankamon Dittakan
Kamontorn Prompitak
Phutphisit Thungklang
Chatchawan Wongwattanakit
机构
[1] Prince of Songkla University,College of Computing and Faculty of Hospitality and Tourism
[2] Phuket Campus,undefined
来源
关键词
Image Captioning; Transformer Learning Model; Self-Attention Mechanism; Encoder-Decoder; Image feature extraction; Instagram image;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, images are being used more extensively for communication purposes. A single image can convey a variety of stories, depending on the perspective and thoughts of everyone who views it. To facilitate comprehension, inclusion image captions is highly beneficial, especially for individuals with visual impairments who can read Braille or rely on audio descriptions. The purpose of this research is to create an automatic captioning system that is easy to understand and quick to generate. This system can be applied to other related systems. In this research, the transformer learning process is applied to image captioning instead of the convolutional neural networks (CNN) and recurrent neural networks (RNN) process which has limitations in processing long-sequence data and managing data complexity. The transformer learning process can handle these limitations well and more efficiently. Additionally, the image captioning system was trained on a dataset of 5,000 images from Instagram that were tagged with the hashtag "Phuket" (#Phuket). The researchers also wrote the captions themselves to use as a dataset for testing the image captioning system. The experiments showed that the transformer learning process can generate natural captions that are close to human language. The generated captions will also be evaluated using the Bilingual Evaluation Understudy (BLEU) score and Metric for Evaluation of Translation with Explicit Ordering (METEOR) score, a metric for measuring the similarity between machine-translated text and human-written text. This will allow us to compare the resemblance between the researcher-written captions and the transformer-generated captions.
引用
收藏
页码:46397 / 46417
页数:20
相关论文
共 50 条
  • [41] Data augmentation to stabilize image caption generation models in deep learning
    Aldabbas H.
    Asad M.
    Ryalat M.H.
    Malik K.R.
    Akbar Qureshi M.Z.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (10): : 571 - 579
  • [42] A Novel Image Caption Model Based on Transformer Structure
    Wang, Shuang
    Zhu, Yaping
    2021 IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2021), 2021, : 144 - 148
  • [43] CAPFORMER: PURE TRANSFORMER FOR REMOTE SENSING IMAGE CAPTION
    Wang, Junjue
    Chen, Zihang
    Ma, Ailong
    Zhong, Yanfei
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 7996 - 7999
  • [44] Caption TLSTMs: combining transformer with LSTMs for image captioning
    Yan, Jie
    Xie, Yuxiang
    Luan, Xidao
    Guo, Yanming
    Gong, Quanzhi
    Feng, Suru
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (02) : 111 - 121
  • [45] Paint to Better Describe: Learning Image Caption by Using Text-to-Image Synthesis
    Wang, Rongzhao
    Liu, Libo
    2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 958 - 964
  • [46] Caption TLSTMs: combining transformer with LSTMs for image captioning
    Jie Yan
    Yuxiang Xie
    Xidao Luan
    Yanming Guo
    Quanzhi Gong
    Suru Feng
    International Journal of Multimedia Information Retrieval, 2022, 11 : 111 - 121
  • [47] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [48] Image Caption Generation with Part of Speech Guidance
    He, Xinwei
    Shi, Baoguang
    Bai, Xiang
    Xia, Gui-Song
    Zhang, Zhaoxiang
    Dong, Weisheng
    PATTERN RECOGNITION LETTERS, 2019, 119 : 229 - 237
  • [49] Cross-Lingual Image Caption Generation
    Miyazaki, Takashi
    Shimizu, Nobuyuki
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1780 - 1790
  • [50] Topic-Based Image Caption Generation
    Dash, Sandeep Kumar
    Acharya, Shantanu
    Pakray, Partha
    Das, Ranjita
    Gelbukh, Alexander
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034