Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

被引：0

作者：

Zhang, Dong ^{[1
]}

Wei, Suzhong ^{[2
]}

Li, Shoushan ^{[1
]}

Wu, Hanqian ^{[2
]}

Zhu, Qiaoming ^{[1
]}

Zhou, Guodong ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China

[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

基金：

中国博士后科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

引用

页码：14347 / 14355

页数：9

共 50 条

[31] Recognition of multi-modal fusion images with irregular interference
Wang, Yawei
Chen, Yifei
Wang, Dongfeng
PEERJ COMPUTER SCIENCE, 2022, 8
[32] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
Cheng J.
Long K.
Zhang S.
Zhang T.
Ma L.
Cheng S.
Guo Y.
IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839
[33] Multi-modal Fusion
Liu, Huaping
Hussain, Amir
Wang, Shuliang
INFORMATION SCIENCES, 2018, 432 : 462 - 462
[34] Chinese named entity recognition based on multi-criteria fusion
Cai Q.
Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2020, 50 (05): : 929 - 934
[35] Chinese Named Entity Recognition Based on Multi-feature Fusion
Sun, Zhenxiang
Sun, Runyuan
Liang, Zhifeng
Su, Zhuang
Yu, Yongxin
Wu, Shuainan
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 670 - 681
[36] Named Entity Recognition Method Based on Multi-Feature Fusion
Huang, Weidong
Yu, Xinhang
APPLIED SCIENCES-BASEL, 2025, 15 (01):
[37] Multi-Feature Fusion Transformer for Chinese Named Entity Recognition
Han, Xiaokai
Yue, Qi
Chu, Jing
Han, Zhan
Shi, Yifan
Wang, Chengfeng
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 4227 - 4232
[38] Multi-modal scene graph inspired policy for visual navigation
He, Yu
Zhou, Kang
Tian, T. Lifang
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
[39] Multi-Modal fusion with multi-level attention for Visual Dialog
Zhang, Jingping
Wang, Qiang
Han, Yahong
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
[40] Named Entity Recognition based on a Graph Structure
Munoz, David
Perez, Fernando
Pinto, David
COMPUTACION Y SISTEMAS, 2020, 24 (02): : 553 - 563

← 1 2 3 4 5 →