Attend to Knowledge: Memory-Enhanced Attention Network for Image Captioning

被引:6
|
作者
Chen, Hui [1 ]
Ding, Guiguang [1 ]
Lin, Zijia [2 ]
Guo, Yuchen [1 ]
Han, Jungong [3 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] Microsoft Res, Beijing 100084, Peoples R China
[3] Univ Lancaster, Sch Comp & Communicat, Lancaster LA1 4YW, England
关键词
Image captioning; Attention mechanism; Memory;
D O I
10.1007/978-3-030-00563-4_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning, which aims to automatically generate sentences for images, has been exploited in many works. The attention-based methods have achieved impressive performance due to its superior ability of adapting the image's feature to the context dynamically. Since the recurrent neural network has difficulties in remembering the information too far in the past, we argue that the attention model may not be adequately supervised by the guidance from the previous information at a distance. In this paper, we propose a memory-enhanced attention model for image captioning, aiming to improve the attention mechanism with previous learned knowledge. Specifically, we store the visual and semantic knowledge which has been exploited in the past into memories, and generate a global visual or semantic feature to improve the attention model. We verify the effectiveness of the proposed model on two prevalent benchmark datasets MS COCO and Flickr30k. The comparison with the state-of-the-art models demonstrates the superiority of the proposed model.
引用
收藏
页码:161 / 171
页数:11
相关论文
共 50 条
  • [41] A Framework for Image Captioning Based on Relation Network and Multilevel Attention Mechanism
    Sharma, Himanshu
    Srivastava, Swati
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 5693 - 5715
  • [42] Dual-stream Self-attention Network for Image Captioning
    Wan, Boyang
    Jiang, Wenhui
    Fang, Yuming
    Wen, Wenying
    Liu, Hantao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [43] BENet: bi-directional enhanced network for image captioning
    Yan, Peixin
    Li, Zuoyong
    Hu, Rong
    Cao, Xinrong
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [44] BENet: bi-directional enhanced network for image captioning
    Peixin Yan
    Zuoyong Li
    Rong Hu
    Xinrong Cao
    [J]. Multimedia Systems, 2024, 30
  • [45] Image Captioning with Memorized Knowledge
    Chen, Hui
    Ding, Guiguang
    Lin, Zijia
    Guo, Yuchen
    Shan, Caifeng
    Han, Jungong
    [J]. COGNITIVE COMPUTATION, 2021, 13 (04) : 807 - 820
  • [46] Image Captioning with Relational Knowledge
    Yang, Huan
    Song, Dandan
    Liao, Lejian
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 378 - 386
  • [47] Image Captioning with Memorized Knowledge
    Hui Chen
    Guiguang Ding
    Zijia Lin
    Yuchen Guo
    Caifeng Shan
    Jungong Han
    [J]. Cognitive Computation, 2021, 13 : 807 - 820
  • [48] Memory-enhanced cognitive radar for autonomous navigation
    Reich, Galen M.
    Antoniou, Michael
    Baker, Christopher J.
    [J]. IET RADAR SONAR AND NAVIGATION, 2020, 14 (09): : 1287 - 1296
  • [49] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [50] Bengali Image Captioning with Visual Attention
    Ami, Amit Saha
    Humaira, Mayeesha
    Jim, Md Abidur Rahman Khan
    Paul, Shimul
    Shah, Faisal Muhammad
    [J]. 2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,