GNN-Based Multimodal Named Entity Recognition

被引:2
|
作者
Gong, Yunchao [1 ,2 ,3 ]
Lv, Xueqiang [1 ,2 ]
Yuan, Zhu [1 ,2 ]
You, Xindong [2 ]
Hu, Feng [1 ,3 ]
Chen, Yuzhong [1 ,3 ]
机构
[1] Qinghai Normal Univ, Coll Comp, 38 Wusi West Rd, Xining 810008, Qinghai, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Beijing Key Lab Internet Culture & Digital Dissemi, 35 Beisihuanzhong Rd, Beijing 100101, Peoples R China
[3] Qinghai Normal Univ, State Key Lab Tibetan Intelligent Informat Proc &, 38 Wusi West Rd, Xining 810008, Qinghai, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 08期
基金
中国国家自然科学基金;
关键词
multimodality; named entity recognition; multimodal interaction graph; graph neural network; FUSION;
D O I
10.1093/comjnl/bxae030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Multimodal Named Entity Recognition (MNER) task enhances the text representations and improves the accuracy and robustness of named entity recognition by leveraging visual information from images. However, previous methods have two limitations: (i) the semantic mismatch between text and image modalities makes it challenging to establish accurate internal connections between words and visual representations. Besides, the limited number of characters in social media posts leads to semantic and contextual ambiguity, further exacerbating the semantic mismatch between modalities. (ii) Existing methods employ cross-modal attention mechanisms to facilitate interaction and fusion between different modalities, overlooking fine-grained correspondences between semantic units of text and images. To alleviate these issues, we propose a graph neural network approach for MNER (GNN-MNER), which promotes fine-grained alignment and interaction between semantic units of different modalities. Specifically, to mitigate the issue of semantic mismatch between modalities, we construct corresponding graph structures for text and images, and leverage graph convolutional networks to augment text and visual representations. For the second issue, we propose a multimodal interaction graph to explicitly represent the fine-grained semantic correspondences between text and visual objects. Based on this graph, we implement deep-level feature fusion between modalities utilizing graph attention networks. Compared with existing methods, our approach is the first to extend graph deep learning throughout the MNER task. Extensive experiments on the Twitter multimodal datasets validate the effectiveness of our GNN-MNER.
引用
收藏
页码:2622 / 2632
页数:11
相关论文
共 50 条
  • [1] A Survey on Multimodal Named Entity Recognition
    Qian, Shenyi
    Jin, Wenduo
    Chen, Yonggang
    Ma, Jiangtao
    Qiao, Yaqiong
    Lu, Jinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622
  • [2] Review of Multimodal Named Entity Recognition Studies
    Han P.
    Chen W.
    Data Analysis and Knowledge Discovery, 2024, 8 (04) : 50 - 63
  • [3] GraphNEI: A GNN-based network entity identification method for IP geolocation
    Ma, Zhaorui
    Zhang, Shicheng
    Li, Na
    Li, Tianao
    Hu, Xinhao
    Feng, Hao
    Zhou, Qinglei
    Liu, Fenlin
    Quan, Xiaowen
    Wang, Hongjian
    Hu, Guangwu
    Zhang, Shubo
    Zhai, Yaqi
    Chen, Shuaibin
    Zhang, Shuaiwei
    COMPUTER NETWORKS, 2023, 235
  • [4] Grounded Multimodal Named Entity Recognition on Social Media
    Yu, Jianfei
    Li, Ziyan
    Wang, Jieming
    Xia, Rui
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9141 - 9154
  • [5] MS2-GNN: Exploring GNN-Based Multimodal Fusion Network for Depression Detection
    Chen, Tao
    Hong, Richang
    Guo, Yanrong
    Hao, Shijie
    Hu, Bin
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (12) : 7749 - 7759
  • [6] A multi-task framework based on decomposition for multimodal named entity recognition
    Cai, Chenran
    Wang, Qianlong
    Qin, Bing
    Xu, Ruifeng
    NEUROCOMPUTING, 2024, 604
  • [7] Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
    Yu, Jianfei
    Jiang, Jing
    Yang, Li
    Xia, Rui
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3342 - 3352
  • [8] Trigger-GNN: A Trigger-Based Graph Neural Network for Nested Named Entity Recognition
    Sui, Yuan
    Bu, Fanyang
    Hu, Yingting
    Zhang, Liang
    Yan, Wei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
    Li, Xiujiao
    Sun, Guanglu
    Liu, Xinyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
  • [10] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
    Wang, Peng
    Chen, Xiaohang
    Shang, Ziyu
    Ke, Wenjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555