Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning

被引:6
|
作者
Chen, Hui [1 ]
Ding, Guiguang [1 ]
Lin, Zijia [2 ]
Guo, Yuchen [1 ]
Han, Jungong [3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Microsoft Res, Beijing 100084, Peoples R China
[3] Univ Lancaster, Sch Comp & Communicat, Lancaster LA1 4YW, England
关键词
Image captioning; Attention mechanism; Memory;
D O I
10.1007/978-3-030-00563-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning, which aims to automatically generate sentences for images, has been exploited in many works. The attention-based methods have achieved impressive performance due to its superior ability of adapting the image's feature to the context dynamically. Since the recurrent neural network has difficulties in remembering the information too far in the past, we argue that the attention model may not be adequately supervised by the guidance from the previous information at a distance. In this paper, we propose a memory-enhanced attention model for image captioning, aiming to improve the attention mechanism with previous learned knowledge. Specifically, we store the visual and semantic knowledge which has been exploited in the past into memories, and generate a global visual or semantic feature to improve the attention model. We verify the effectiveness of the proposed model on two prevalent benchmark datasets MS COCO and Flickr30k. The comparison with the state-of-the-art models demonstrates the superiority of the proposed model.
引用
收藏
页码:161 / 171
页数:11
相关论文
共 50 条
  • [1] GMEKT: A Novel Graph Attention-Based Memory-Enhanced Knowledge Tracing
    Chen, Mianfan
    Ma, Wenjun
    Mao, Shun
    Jiang, Yuncheng
    [J]. PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2022, 13629 : 408 - 421
  • [2] Hierarchical Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
  • [3] Hybrid attention network for image captioning
    Jiang, Wenhui
    Li, Qin
    Zhan, Kun
    Fang, Yuming
    Shen, Fei
    [J]. DISPLAYS, 2022, 73
  • [4] Multivariate Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    [J]. COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 587 - 602
  • [5] Memory-Enhanced Knowledge Reasoning with Reinforcement Learning
    Guo, Jinhui
    Zhang, Xiaoli
    Liang, Kun
    Zhang, Guoqiang
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [6] Attend to You: Personalized Image Captioning with Context Sequence Memory Networks
    Park, Cesc Chunseong
    Kim, Byeongchang
    Kim, Gunhee
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6432 - 6440
  • [7] Self-Enhanced Attention for Image Captioning
    Sun, Qingyu
    Zhang, Juan
    Fang, Zhijun
    Gao, Yongbin
    [J]. NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [8] Self-Enhanced Attention for Image Captioning
    Qingyu Sun
    Juan Zhang
    Zhijun Fang
    Yongbin Gao
    [J]. Neural Processing Letters, 56
  • [9] MS-HGAT: Memory-Enhanced Sequential Hypergraph Attention Network for Information Diffusion Prediction
    Sun, Ling
    Rao, Yuan
    Zhang, Xiangbo
    Lan, Yuqian
    Yu, Shuanghe
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4156 - 4164
  • [10] A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING
    Sow, Daouda
    Qin, Zengchang
    Niasse, Mouhamed
    Wan, Tao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3802 - 3806