Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

被引：0

作者：

Zhang, Dong ^{[1
]}

Wei, Suzhong ^{[2
]}

Li, Shoushan ^{[1
]}

Wu, Hanqian ^{[2
]}

Zhu, Qiaoming ^{[1
]}

Zhou, Guodong ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China

[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

基金：

中国博士后科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

引用

页码：14347 / 14355

页数：9

共 50 条

[21] Named Entity Recognition as Graph Classification
Harrando, Ismail
Troncy, Raphael
SEMANTIC WEB: ESWC 2021 SATELLITE EVENTS, 2021, 12739 : 103 - 108
[22] Multi-Modal Dynamic Graph Transformer for Visual Grounding
Chen, Sijia
Li, Baochun
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15513 - 15522
[23] Graph-Based Multi-Modal Multi-View Fusion for Facial Action Unit Recognition
Chen, Jianrong
Dey, Sujit
IEEE ACCESS, 2024, 12 : 69310 - 69324
[24] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
He, Li
Wang, Qingxiang
Liu, Jie
Duan, Jianyong
Wang, Hao
APPLIED SCIENCES-BASEL, 2024, 14 (06):
[25] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
Priyasad, Darshana
Fernando, Tharindu
Denman, Simon
Sridharan, Sridha
Fookes, Clinton
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
[26] Human activity recognition based on multi-modal fusion
Zhang, Cheng
Zu, Tianqi
Hou, Yibin
He, Jian
Yang, Shengqi
Dong, Ruihai
CCF TRANSACTIONS ON PERVASIVE COMPUTING AND INTERACTION, 2023, 5 (03) : 321 - 332
[27] Recognition of multi-modal fusion images with irregular interference
Wang, Yawei
Chen, Yifei
Wang, Dongfeng
PeerJ Computer Science, 2022, 8
[28] Hybrid Multi-modal Fusion for Human Action Recognition
Seddik, Bassem
Gazzah, Sami
Ben Amara, Najoua Essoukri
IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 201 - 209
[29] Graph structure prefix injection transformer for multi-modal entity alignment
Zhang, Yan
Luo, Xiangyu
Hu, Jing
Zhang, Miao
Xiao, Kui
Li, Zhifei
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
[30] Human activity recognition based on multi-modal fusion
Cheng Zhang
Tianqi Zu
Yibin Hou
Jian He
Shengqi Yang
Ruihai Dong
CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 321 - 332

← 1 2 3 4 5 →