Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

被引:0
|
作者
Zhang, Dong [1 ]
Wei, Suzhong [2 ]
Li, Shoushan [1 ]
Wu, Hanqian [2 ]
Zhu, Qiaoming [1 ]
Zhou, Guodong [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
基金
中国博士后科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.
引用
收藏
页码:14347 / 14355
页数:9
相关论文
共 50 条
  • [21] Named Entity Recognition as Graph Classification
    Harrando, Ismail
    Troncy, Raphael
    SEMANTIC WEB: ESWC 2021 SATELLITE EVENTS, 2021, 12739 : 103 - 108
  • [22] Multi-Modal Dynamic Graph Transformer for Visual Grounding
    Chen, Sijia
    Li, Baochun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15513 - 15522
  • [23] Graph-Based Multi-Modal Multi-View Fusion for Facial Action Unit Recognition
    Chen, Jianrong
    Dey, Sujit
    IEEE ACCESS, 2024, 12 : 69310 - 69324
  • [24] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
    He, Li
    Wang, Qingxiang
    Liu, Jie
    Duan, Jianyong
    Wang, Hao
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [25] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [26] Human activity recognition based on multi-modal fusion
    Zhang, Cheng
    Zu, Tianqi
    Hou, Yibin
    He, Jian
    Yang, Shengqi
    Dong, Ruihai
    CCF TRANSACTIONS ON PERVASIVE COMPUTING AND INTERACTION, 2023, 5 (03) : 321 - 332
  • [27] Recognition of multi-modal fusion images with irregular interference
    Wang, Yawei
    Chen, Yifei
    Wang, Dongfeng
    PeerJ Computer Science, 2022, 8
  • [28] Hybrid Multi-modal Fusion for Human Action Recognition
    Seddik, Bassem
    Gazzah, Sami
    Ben Amara, Najoua Essoukri
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 201 - 209
  • [29] Graph structure prefix injection transformer for multi-modal entity alignment
    Zhang, Yan
    Luo, Xiangyu
    Hu, Jing
    Zhang, Miao
    Xiao, Kui
    Li, Zhifei
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [30] Human activity recognition based on multi-modal fusion
    Cheng Zhang
    Tianqi Zu
    Yibin Hou
    Jian He
    Shengqi Yang
    Ruihai Dong
    CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 321 - 332