Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

被引：0

作者：

Zhang, Dong ^{[1
]}

Wei, Suzhong ^{[2
]}

Li, Shoushan ^{[1
]}

Wu, Hanqian ^{[2
]}

Zhu, Qiaoming ^{[1
]}

Zhou, Guodong ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China

[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

基金：

中国博士后科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

引用

页码：14347 / 14355

页数：9

共 50 条

[1] MMAF: Masked Multi-modal Attention Fusion to Reduce Bias of Visual Features for Named Entity Recognition
Jinhui Pang
Xinyun Yang
Xiaoyao Qiu
Zixuan Wang
Taisheng Huang
Data Intelligence, 2024, 6 (04) : 1114 - 1133
[2] Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance
Gong, Yunchao
Lv, Xueqiang
Yuan, Zhu
Wang, ZhaoJun
Hu, Feng
You, Xindong
JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23767 - 23793
[3] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
Yi, Feng
Jiang, Bo
Wang, Lu
Wu, Jianjun
IEEE ACCESS, 2020, 8 : 63214 - 63224
[4] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Wang, Xinyu
Gui, Min
Jiang, Yong
Jia, Zixia
Bach, Nguyen
Wang, Tao
Huang, Zhongqiang
Huang, Fei
Tu, Kewei
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
[5] RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
Wang, Min
Chen, Hongbin
Shen, Dingcai
Li, Baolei
Hu, Shiyu
PEERJ COMPUTER SCIENCE, 2024, 10
[6] DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements
Liu, Chang
Yang, Dongsheng
Yu, Bihui
Bu, Liping
IEEE ACCESS, 2024, 12 : 69151 - 69162
[7] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
Cheng, Bo
Zhu, Jia
Guo, Meimei
NEUROCOMPUTING, 2022, 500 : 581 - 591
[8] DFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment
Zhu, Jia
Huang, Changqin
De Meo, Pasquale
INFORMATION FUSION, 2023, 90 : 111 - 119
[9] MMEA: Entity Alignment for Multi-modal Knowledge Graph
Chen, Liyi
Li, Zhi
Wang, Yijun
Xu, Tong
Wang, Zhefeng
Chen, Enhong
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 134 - 147
[10] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
Lei, Han
Chen, Ning
INTERSPEECH 2022, 2022, : 4157 - 4161

← 1 2 3 4 5 →