Image Caption with Endogenous–Exogenous Attention

被引:1
|
作者
Teng Wang
Haifeng Hu
Chen He
机构
[1] School of Electronic and Information Engineering,
[2] Sun Yat-sen University,undefined
来源
Neural Processing Letters | 2019年 / 50卷
关键词
Image caption; Convolutional neural network; Recurrent neural network; Visual attention;
D O I
暂无
中图分类号
学科分类号
摘要
Automatically generating captions of an image is a fundamental problem in computer vision and natural language processing, which translates the content of the image into natural language with correct grammar and structure. Attention-based model has been widely adopted for captioning tasks. Most attention models generate only single certain attention heat map for indicating eyes where to see. However, these models ignore the endogenous orienting which depends on the interests, goals or desires of the observers, and constrain the diversity of captions. To improve both the accuracy and diversity of the generated sentences, we present a novel endogenous–exogenous attention architecture to capture both the endogenous attention, which indicates stochastic visual orienting, and the exogenous attention, which indicates deterministic visual orienting. At each time step, our model generates two attention maps, endogenous heat map and exogenous heat map, and then fuses them into hidden state of LSTM for sequential word generation. We evaluate our model on the Flickr30k and MSCOCO datasets, and experiments show the accuracy of the model and the diversity of captions it learns. Our model achieves better performance over state-of-the-art methods.
引用
收藏
页码:431 / 443
页数:12
相关论文
共 50 条
  • [1] Image Caption with Endogenous-Exogenous Attention
    Wang, Teng
    Hu, Haifeng
    He, Chen
    [J]. NEURAL PROCESSING LETTERS, 2019, 50 (01) : 431 - 443
  • [2] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [3] Image caption based on Visual Attention Mechanism
    Zhou, Jinfei
    Zhu, Yaping
    Pan, Hong
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32
  • [4] Image Caption with Global-Local Attention
    Li, Linghui
    Tang, Sheng
    Deng, Lixi
    Zhang, Yongdong
    Tian, Qi
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4133 - 4139
  • [5] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [6] Multilayer Dense Attention Model for Image Caption
    Wang, Eric Ke
    Zhang, Xun
    Wang, Fan
    Wu, Tsu-Yang
    Chen, Chien-Ming
    [J]. IEEE ACCESS, 2019, 7 : 66358 - 66368
  • [7] Image Caption with Synchronous Cross-Attention
    Wang, Yue
    Liu, Jinlai
    Wang, Xiaojie
    [J]. PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 433 - 441
  • [8] Exogenous and endogenous spatial attention in crows
    Quest, Mahe
    Rinnert, Paul
    Hahner, Linus
    Nieder, Andreas
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (49)
  • [9] Disentangling exogenous and endogenous temporal attention
    Lawrence, Michael
    Klein, Raymond
    [J]. CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE EXPERIMENTALE, 2008, 62 (04): : 309 - 309
  • [10] The independence of endogenous and exogenous temporal attention
    C. R. McCormick
    R. S. Redden
    M. A. Lawrence
    R. M. Klein
    [J]. Attention, Perception, & Psychophysics, 2018, 80 : 1885 - 1891