Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

被引:0
|
作者
Zhang, Dong [1 ]
Wei, Suzhong [2 ]
Li, Shoushan [1 ]
Wu, Hanqian [2 ]
Zhu, Qiaoming [1 ]
Zhou, Guodong [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
基金
中国博士后科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into predefined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.
引用
收藏
页码:14347 / 14355
页数:9
相关论文
共 50 条
  • [1] MMAF: Masked Multi-modal Attention Fusion to Reduce Bias of Visual Features for Named Entity Recognition
    Jinhui Pang
    Xinyun Yang
    Xiaoyao Qiu
    Zixuan Wang
    Taisheng Huang
    Data Intelligence, 2024, 6 (04) : 1114 - 1133
  • [2] Multimodal heterogeneous graph entity-level fusion for named entity recognition with multi-granularity visual guidance
    Gong, Yunchao
    Lv, Xueqiang
    Yuan, Zhu
    Wang, ZhaoJun
    Hu, Feng
    You, Xindong
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (16): : 23767 - 23793
  • [3] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
    Yi, Feng
    Jiang, Bo
    Wang, Lu
    Wu, Jianjun
    IEEE ACCESS, 2020, 8 : 63214 - 63224
  • [4] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
    Wang, Xinyu
    Gui, Min
    Jiang, Yong
    Jia, Zixia
    Bach, Nguyen
    Wang, Tao
    Huang, Zhongqiang
    Huang, Fei
    Tu, Kewei
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
  • [5] RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
    Wang, Min
    Chen, Hongbin
    Shen, Dingcai
    Li, Baolei
    Hu, Shiyu
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [6] DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements
    Liu, Chang
    Yang, Dongsheng
    Yu, Bihui
    Bu, Liping
    IEEE ACCESS, 2024, 12 : 69151 - 69162
  • [7] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
    Cheng, Bo
    Zhu, Jia
    Guo, Meimei
    NEUROCOMPUTING, 2022, 500 : 581 - 591
  • [8] DFMKE: A dual fusion multi-modal knowledge graph embedding framework for entity alignment
    Zhu, Jia
    Huang, Changqin
    De Meo, Pasquale
    INFORMATION FUSION, 2023, 90 : 111 - 119
  • [9] MMEA: Entity Alignment for Multi-modal Knowledge Graph
    Chen, Liyi
    Li, Zhi
    Wang, Yijun
    Xu, Tong
    Wang, Zhefeng
    Chen, Enhong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT I, 2020, 12274 : 134 - 147
  • [10] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
    Lei, Han
    Chen, Ning
    INTERSPEECH 2022, 2022, : 4157 - 4161