Visual Attention Based on Long-Short Term Memory Model for Image Caption Generation

被引:0
|
作者
Qu, Shiru [1 ]
Xi, Yuling [1 ]
Ding, Songtao [1 ]
机构
[1] Northwestern Polytech Univ, Sch Automat, Xian 710072, Shaanxi, Peoples R China
关键词
Image Caption; RNN; LSTM; CNN; Attention Mechanism;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image caption generation becomes a raising topic in computer vision and artificial intelligence. In order to solve the problem of stiff description, we intend to extract richer features using convolutional neural network (CNN). A neural and probabilistic framework has been proposed consequently which combines CNN with a special form of recurrent neural network (RNN) to produce an end-to-end image captioning. We use a model that takes advantage of word to vector to encode the variable length input into a fixed dimensional vector. Considering the description of the object in an image is not specific enough, we introduce an attention mechanism through visualization to show how the model is able to fix its gaze on salient objects. We validate our model on three benchmark datasets and get great performance by using standard evaluation metrics.
引用
收藏
页码:4789 / 4794
页数:6
相关论文
共 50 条
  • [1] Guiding the Long-Short Term Memory model for Image Caption Generation
    Jia, Xu
    Gavves, Efstratios
    Fernando, Basura
    Tuytelaars, Tinne
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2407 - 2415
  • [2] Supervised Guiding Long-Short Term Memory for Image Caption Generation based on Object Classes
    Wang, Jian
    Cao, Zhiguo
    Xiao, Yang
    Qi, Xinyuan
    [J]. MIPPR 2017: PATTERN RECOGNITION AND COMPUTER VISION, 2017, 10609
  • [3] A novel automatic image caption generation using bidirectional long-short term memory framework
    Ye, Zhongfu
    Khan, Rashid
    Naqvi, Nuzhat
    Islam, M. Shujah
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25557 - 25582
  • [4] A novel automatic image caption generation using bidirectional long-short term memory framework
    Zhongfu Ye
    Rashid Khan
    Nuzhat Naqvi
    M. Shujah Islam
    [J]. Multimedia Tools and Applications, 2021, 80 : 25557 - 25582
  • [5] Cross-Lingual Image Caption Generation Based on Visual Attention Model
    Wang, Bin
    Wang, Cungang
    Zhang, Qian
    Su, Ying
    Wang, Yang
    Xu, Yanyan
    [J]. IEEE ACCESS, 2020, 8 : 104543 - 104554
  • [6] Long short-term memory network with external memories for image caption generation
    Jiang, Teng
    Zhan, Chengjun
    Yang, Yupu
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [7] Clothes image caption generation with attribute detection and visual attention model
    Li, Xianrui
    Ye, Zhiling
    Zhang, Zhao
    Zhao, Mingbo
    [J]. PATTERN RECOGNITION LETTERS, 2021, 141 : 68 - 74
  • [8] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [9] Image caption based on Visual Attention Mechanism
    Zhou, Jinfei
    Zhu, Yaping
    Pan, Hong
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32
  • [10] Prediction and Interpretation of Epidemic Spread Based on Long-Short Term Memory Model
    Pan, Qiao
    Li, Qian
    Chen, Dehua
    Xie, Liying
    [J]. 2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 946 - 951