Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning

被引:6
|
作者
Chen, Hui [1 ]
Ding, Guiguang [1 ]
Lin, Zijia [2 ]
Guo, Yuchen [1 ]
Han, Jungong [3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Microsoft Res, Beijing 100084, Peoples R China
[3] Univ Lancaster, Sch Comp & Communicat, Lancaster LA1 4YW, England
关键词
Image captioning; Attention mechanism; Memory;
D O I
10.1007/978-3-030-00563-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning, which aims to automatically generate sentences for images, has been exploited in many works. The attention-based methods have achieved impressive performance due to its superior ability of adapting the image's feature to the context dynamically. Since the recurrent neural network has difficulties in remembering the information too far in the past, we argue that the attention model may not be adequately supervised by the guidance from the previous information at a distance. In this paper, we propose a memory-enhanced attention model for image captioning, aiming to improve the attention mechanism with previous learned knowledge. Specifically, we store the visual and semantic knowledge which has been exploited in the past into memories, and generate a global visual or semantic feature to improve the attention model. We verify the effectiveness of the proposed model on two prevalent benchmark datasets MS COCO and Flickr30k. The comparison with the state-of-the-art models demonstrates the superiority of the proposed model.
引用
收藏
页码:161 / 171
页数:11
相关论文
共 50 条
  • [21] Collaborative strategy network for spatial attention image captioning
    Dongming Zhou
    Jing Yang
    Riqiang Bao
    [J]. Applied Intelligence, 2022, 52 : 9017 - 9032
  • [22] Memory-Enhanced Evolutionary Robotics: The Echo State Network Approach
    Hartland, Cedric
    Bredeche, Nicolas
    Sebag, Michele
    [J]. 2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 2788 - 2795
  • [23] Group-based Distinctive Image Captioning with Memory Attention
    Wang, Jiuniu
    Xu, Wenjia
    Wang, Qingzhong
    Chan, Antoni B.
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5020 - 5028
  • [24] Multi-view Attention with Memory Assistant for Image Captioning
    Fu, You
    Fang, Siyu
    Wang, Rui
    Yi, Xiulong
    Yu, Jianzhi
    Hua, Rong
    [J]. 2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 436 - 440
  • [25] Enhanced Text-Guided Attention Model for Image Captioning
    Zhou, Yuanen
    Hu, Zhenzhen
    Zhao, Ye
    Liu, Xueliang
    Hong, Richang
    [J]. 2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [26] Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning
    Xu, Chunpu
    Yang, Min
    Ao, Xiang
    Shen, Ying
    Xu, Ruifeng
    Tian, Jinwen
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 214
  • [27] Recurrent Relational Memory Network for Unsupervised Image Captioning
    Guo, Dan
    Wang, Yang
    Song, Peipei
    Wang, Meng
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 920 - 926
  • [28] A Dual Self-Attention based Network for Image Captioning
    Li, ZhiYong
    Yang, JinFu
    Li, YaPing
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1590 - 1595
  • [29] Multilevel attention and relation network based image captioning model
    Sharma, Himanshu
    Srivastava, Swati
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 10981 - 11003
  • [30] Joint Scence Network and Attention-Guided for Image Captioning
    Zhou, Dongming
    Yang, Jing
    Zhang, Canlong
    Tang, Yanping
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1535 - 1540