Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph

被引：7

作者：

Zhao, Wentian ^{[1
]}

Wu, Xinxiao ^{[1
,2
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Key Lab Intelligent Informat Technol, Beijing 100081, Peoples R China

[2] Shenzhen MSU BIT Univ, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen 518172, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Image captioning; named entity; knowledge graph;

D O I：

10.1109/TMM.2023.3301279

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article. This task remains challenging as it is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities. Furthermore, the complexity of the article brings difficulty in extracting fine-grained relationships between entities to generate informative event descriptions about the image. To tackle these challenges, we propose a novel approach that constructs a multi-modal knowledge graph (MMKG) to associate the visual objects with named entities and capture the relationship between entities simultaneously with the help of external knowledge collected from the web. Specifically, we build a text sub-graph by extracting named entities and their relationships from the article, and build an image sub-graph by detecting the objects in the image. To connect these two sub-graphs, we propose a cross-modal entity matching module trained using a knowledge base that contains Wikipedia entries and the corresponding images. Finally, the MMKG is integrated into the captioning model via a graph attention mechanism. Extensive experiments on both GoodNews and NYTimes800 k datasets demonstrate the effectiveness of our method.

引用

页码：2659 / 2670

页数：12

共 50 条

[1] ICECAP: Information Concentrated Entity-aware Image Captioning
Hu, Anwen
Chen, Shizhe
Jin, Qin
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4217 - 4225
[2] Transform, contrast and tell: Coherent entity-aware multi-image captioning
Chen, Jingqiang
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
[3] Abnormal Entity-Aware Knowledge Graph Completion
Sun, Ke
Yu, Shuo
Peng, Ciyuan
Li, Xiang
Naseriparsa, Mehdi
Xia, Feng
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 891 - 900
[4] MMEA: Entity Alignment for Multi-modal Knowledge Graph
Chen, Liyi
Li, Zhi
Wang, Yijun
Xu, Tong
Wang, Zhefeng
Chen, Enhong
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 134 - 147
[5] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
Cheng, Bo
Zhu, Jia
Guo, Meimei
[J]. NEUROCOMPUTING, 2022, 500 : 581 - 591
[6] Triplet-aware graph neural networks for factorized multi-modal knowledge graph entity alignment
Li, Qian
Li, Jianxin
Wu, Jia
Peng, Xutan
Ji, Cheng
Peng, Hao
Wang, Lihong
Yu, Philip S.
[J]. NEURAL NETWORKS, 2024, 179
[7] Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia
Nguyen, Khanh
Furkan Biten, Ali
Mafla, Andres
Gomez, Lluis
Karatzas, Dimosthenis
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1940 - 1948
[8] Multi-modal Graph Convolutional Network for Knowledge Graph Entity Alignment
You, Yinghui
Wei, Yuyang
Zhang, Yanlong
Chen, Wei
Zhao, Lei
[J]. WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 142 - 157
[9] Fine-tuning with Multi-modal Entity Prompts for News Image Captioning
Zhang, Jingjing
Fang, Shancheng
Mao, Zhendong
Zhang, Zhiwei
Zhang, Yongdong
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4365 - 4373
[10] Entity-aware Collaborative Relation Network with Knowledge Graph for Recommendation
Huang, Ruoran
Han, Chuanqi
Cui, Li
[J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3098 - 3102

← 1 2 3 4 5 →