Guiding the Long-Short Term Memory model for Image Caption Generation

被引:256
|
作者
Jia, Xu [1 ]
Gavves, Efstratios [2 ]
Fernando, Basura [3 ]
Tuytelaars, Tinne [1 ]
机构
[1] Katholieke Univ Leuven, ESAT PSI, IMinds, Leuven, Belgium
[2] Univ Amsterdam, QUVA Lab, NL-1012 WX Amsterdam, Netherlands
[3] Australian Natl Univ, ACRV, Canberra, ACT 0200, Australia
关键词
D O I
10.1109/ICCV.2015.277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search to avoid bias towards short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or better than the current state-of-the-art.
引用
收藏
页码:2407 / 2415
页数:9
相关论文
共 50 条
  • [1] Supervised Guiding Long-Short Term Memory for Image Caption Generation based on Object Classes
    Wang, Jian
    Cao, Zhiguo
    Xiao, Yang
    Qi, Xinyuan
    [J]. MIPPR 2017: PATTERN RECOGNITION AND COMPUTER VISION, 2017, 10609
  • [2] Visual Attention Based on Long-Short Term Memory Model for Image Caption Generation
    Qu, Shiru
    Xi, Yuling
    Ding, Songtao
    [J]. 2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4789 - 4794
  • [3] A novel automatic image caption generation using bidirectional long-short term memory framework
    Zhongfu Ye
    Rashid Khan
    Nuzhat Naqvi
    M. Shujah Islam
    [J]. Multimedia Tools and Applications, 2021, 80 : 25557 - 25582
  • [4] A novel automatic image caption generation using bidirectional long-short term memory framework
    Ye, Zhongfu
    Khan, Rashid
    Naqvi, Nuzhat
    Islam, M. Shujah
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (17) : 25557 - 25582
  • [5] Long short-term memory network with external memories for image caption generation
    Jiang, Teng
    Zhan, Chengjun
    Yang, Yupu
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [6] Stock Price Prediction with Long-short Term Memory Model
    Wang, Runyu
    Zuo, Zhengyu
    [J]. 2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 274 - 279
  • [7] A General Model for Long-short Term Anomaly Generation in Sensory Data
    Thien-Binh Dang
    Duc-Tai Le
    Kim, Moonseong
    Choo, Hyunseung
    [J]. PROCEEDINGS OF THE 2022 16TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2022), 2022,
  • [8] Generating Video Description with Long-Short Term Memory
    Li, Shuohao
    Zhang, Jun
    Guo, Qiang
    Lei, Jun
    Tu, Dan
    [J]. 2016 INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2016), 2016, : 73 - 78
  • [9] Prediction and Interpretation of Epidemic Spread Based on Long-Short Term Memory Model
    Pan, Qiao
    Li, Qian
    Chen, Dehua
    Xie, Liying
    [J]. 2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 946 - 951
  • [10] Photonic Long-Short Term Memory Neural Networks with Analog Memory
    Howard, Emma R.
    Marquez, Bicky A.
    Shastri, Bhavin J.
    [J]. 2020 IEEE PHOTONICS CONFERENCE (IPC), 2020,