Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph

被引:7
|
作者
Zhao, Wentian [1 ]
Wu, Xinxiao [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Key Lab Intelligent Informat Technol, Beijing 100081, Peoples R China
[2] Shenzhen MSU BIT Univ, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen 518172, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; named entity; knowledge graph;
D O I
10.1109/TMM.2023.3301279
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains challenging as it is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities. Furthermore, the complexity of the article brings difficulty in extracting fine-grained relationships between entities to generate informative event descriptions about the image. To tackle these challenges, we propose a novel approach that constructs a multi-modal knowledge graph (MMKG) to associate the visual objects with named entities and capture the relationship between entities simultaneously with the help of external knowledge collected from the web. Specifically, we build a text sub-graph by extracting named entities and their relationships from the article, and build an image sub-graph by detecting the objects in the image. To connect these two sub-graphs, we propose a cross-modal entity matching module trained using a knowledge base that contains Wikipedia entries and the corresponding images. Finally, the MMKG is integrated into the captioning model via a graph attention mechanism. Extensive experiments on both GoodNews and NYTimes800 k datasets demonstrate the effectiveness of our method.
引用
收藏
页码:2659 / 2670
页数:12
相关论文
共 50 条
  • [1] ICECAP: Information Concentrated Entity-aware Image Captioning
    Hu, Anwen
    Chen, Shizhe
    Jin, Qin
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4217 - 4225
  • [2] Transform, contrast and tell: Coherent entity-aware multi-image captioning
    Chen, Jingqiang
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
  • [3] Abnormal Entity-Aware Knowledge Graph Completion
    Sun, Ke
    Yu, Shuo
    Peng, Ciyuan
    Li, Xiang
    Naseriparsa, Mehdi
    Xia, Feng
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 891 - 900
  • [4] MMEA: Entity Alignment for Multi-modal Knowledge Graph
    Chen, Liyi
    Li, Zhi
    Wang, Yijun
    Xu, Tong
    Wang, Zhefeng
    Chen, Enhong
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 134 - 147
  • [5] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
    Cheng, Bo
    Zhu, Jia
    Guo, Meimei
    [J]. NEUROCOMPUTING, 2022, 500 : 581 - 591
  • [6] Triplet-aware graph neural networks for factorized multi-modal knowledge graph entity alignment
    Li, Qian
    Li, Jianxin
    Wu, Jia
    Peng, Xutan
    Ji, Cheng
    Peng, Hao
    Wang, Lihong
    Yu, Philip S.
    [J]. NEURAL NETWORKS, 2024, 179
  • [7] Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia
    Nguyen, Khanh
    Furkan Biten, Ali
    Mafla, Andres
    Gomez, Lluis
    Karatzas, Dimosthenis
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1940 - 1948
  • [8] Multi-modal Graph Convolutional Network for Knowledge Graph Entity Alignment
    You, Yinghui
    Wei, Yuyang
    Zhang, Yanlong
    Chen, Wei
    Zhao, Lei
    [J]. WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 142 - 157
  • [9] Fine-tuning with Multi-modal Entity Prompts for News Image Captioning
    Zhang, Jingjing
    Fang, Shancheng
    Mao, Zhendong
    Zhang, Zhiwei
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4365 - 4373
  • [10] Entity-aware Collaborative Relation Network with Knowledge Graph for Recommendation
    Huang, Ruoran
    Han, Chuanqi
    Cui, Li
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3098 - 3102