Image Caption Generation With Adaptive Transformer

被引:0
|
作者
Zhang, Wei [1 ]
Nie, Wenbo [1 ]
Li, Xinle [1 ]
Yu, Yao [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
image caption; adaptive attention; transformer;
D O I
10.1109/yac.2019.8787715
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Encoder-decoder framework based image caption has made promising progress. The application of various attention mechanisms has also greatly improved the performance of the caption model. Improving the performance of every part of the framework or employ more effective attention mechanism Hill benefit the eventual performance. Based on this idea we make improvements in two aspects. Firstly we use more powerful decoder. Recent work shows that Transformer is superior in efficiency and performance to LSTM in some NLP tasks, so we use Transformer to substitute the traditional decoder LSTM to accelerate the training process. Secondly we combine the spatial attention and adaptive attention into Transformer, which makes decoder to determine where and when to use image region information. We use this method to experiment on the Flickr30k dataset and achieve better results.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [1] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [2] A transformer-based Urdu image caption generation
    Muhammad Hadi
    Iqra Safder
    Hajra Waheed
    Farooq Zaman
    Naif Radi Aljohani
    Raheel Nawaz
    Saeed Ul Hassan
    Raheem Sarwar
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (9) : 3441 - 3457
  • [3] Remote sensing image caption generation via transformer and reinforcement learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
  • [4] Remote sensing image caption generation via transformer and reinforcement learning
    Xiangqing Shen
    Bing Liu
    Yong Zhou
    Jiaqi Zhao
    [J]. Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
  • [5] Image caption generation using transformer learning methods: a case study on instagram image
    Dittakan, Kwankamon
    Prompitak, Kamontorn
    Thungklang, Phutphisit
    Wongwattanakit, Chatchawan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 46397 - 46417
  • [6] Image caption generation using transformer learning methods: a case study on instagram image
    Kwankamon Dittakan
    Kamontorn Prompitak
    Phutphisit Thungklang
    Chatchawan Wongwattanakit
    [J]. Multimedia Tools and Applications, 2024, 83 : 46397 - 46417
  • [7] Image caption generation method based on adaptive attention mechanism
    Jin, Huazhong
    Wu, Yu
    Wan, Fang
    Hu, Man
    Li, Qingqing
    [J]. MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [8] TVPRNN for image caption generation
    Yang, Liang
    Hu, Haifeng
    [J]. ELECTRONICS LETTERS, 2017, 53 (22) : 1471 - +
  • [9] CNN image caption generation
    Li, Yong
    Cheng, Honghong
    Liang, Xinyan
    Guo, Qian
    Qian, Yuhua
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (02): : 152 - 157
  • [10] Enhanced transformer model for video caption generation
    Varma, Soumya
    Peter, J. Dinesh
    [J]. EXPERT SYSTEMS, 2023,