GNN-Based Multimodal Named Entity Recognition

被引:2
|
作者
Gong, Yunchao [1 ,2 ,3 ]
Lv, Xueqiang [1 ,2 ]
Yuan, Zhu [1 ,2 ]
You, Xindong [2 ]
Hu, Feng [1 ,3 ]
Chen, Yuzhong [1 ,3 ]
机构
[1] Qinghai Normal Univ, Coll Comp, 38 Wusi West Rd, Xining 810008, Qinghai, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Beijing Key Lab Internet Culture & Digital Dissemi, 35 Beisihuanzhong Rd, Beijing 100101, Peoples R China
[3] Qinghai Normal Univ, State Key Lab Tibetan Intelligent Informat Proc &, 38 Wusi West Rd, Xining 810008, Qinghai, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 08期
基金
中国国家自然科学基金;
关键词
multimodality; named entity recognition; multimodal interaction graph; graph neural network; FUSION;
D O I
10.1093/comjnl/bxae030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Multimodal Named Entity Recognition (MNER) task enhances the text representations and improves the accuracy and robustness of named entity recognition by leveraging visual information from images. However, previous methods have two limitations: (i) the semantic mismatch between text and image modalities makes it challenging to establish accurate internal connections between words and visual representations. Besides, the limited number of characters in social media posts leads to semantic and contextual ambiguity, further exacerbating the semantic mismatch between modalities. (ii) Existing methods employ cross-modal attention mechanisms to facilitate interaction and fusion between different modalities, overlooking fine-grained correspondences between semantic units of text and images. To alleviate these issues, we propose a graph neural network approach for MNER (GNN-MNER), which promotes fine-grained alignment and interaction between semantic units of different modalities. Specifically, to mitigate the issue of semantic mismatch between modalities, we construct corresponding graph structures for text and images, and leverage graph convolutional networks to augment text and visual representations. For the second issue, we propose a multimodal interaction graph to explicitly represent the fine-grained semantic correspondences between text and visual objects. Based on this graph, we implement deep-level feature fusion between modalities utilizing graph attention networks. Compared with existing methods, our approach is the first to extend graph deep learning throughout the MNER task. Extensive experiments on the Twitter multimodal datasets validate the effectiveness of our GNN-MNER.
引用
收藏
页码:2622 / 2632
页数:11
相关论文
共 50 条
  • [31] Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition
    He, Li
    Wang, Qingxiang
    Liu, Jie
    Duan, Jianyong
    Wang, Hao
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [32] USAF: Multimodal Chinese named entity recognition using synthesized acoustic features
    Liu, Ye
    Huang, Shaobin
    Li, Rongsheng
    Yan, Naiyu
    Du, Zhijuan
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [33] ICKA: An instruction construction and Knowledge Alignment framework for Multimodal Named Entity Recognition
    Zeng, Qingyang
    Yuan, Minghui
    Wan, Jing
    Wang, Kunfeng
    Shi, Nannan
    Che, Qianzi
    Liu, Bin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [34] In-context Learning for Few-shot Multimodal Named Entity Recognition
    Cai, Chenran
    Wang, Qianlong
    Liang, Bin
    Qin, Bing
    Yang, Min
    Wong, Kam-Fai
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2969 - 2979
  • [35] GNN-Based Hierarchical Annotation for Analog Circuits
    Kunal, Kishor
    Dhar, Tonmoy
    Madhusudan, Meghna
    Poojary, Jitesh
    Sharma, Arvind K.
    Xu, Wenbin
    Burns, Steven M.
    Hu, Jiang
    Harjani, Ramesh
    Sapatnekar, Sachin S.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (09) : 2801 - 2814
  • [36] Accelerating GNN-based SAR Automatic Target Recognition on HBM-enabled FPGA
    Zhang, Bingyi
    Kannan, Rajgopal
    Prasanna, Viktor
    Busart, Carl
    2023 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE, HPEC, 2023,
  • [37] GNN-Based Depression Recognition Using Spatio-Temporal Information: A fNIRS Study
    Yu, Qiao
    Wang, Rui
    Liu, Jia
    Hu, Long
    Chen, Min
    Liu, Zhongchun
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (10) : 4925 - 4935
  • [38] Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media
    Mai, Weixing
    Zhang, Zhengxuan
    Li, Kuntao
    Xue, Yun
    Li, Fenghuan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 2513 - 2522
  • [39] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839
  • [40] HORUS-NER: A Multimodal Named Entity Recognition Framework for Noisy Data
    Esteves, Diego
    Marcelino, Jose
    Chawla, Piyush
    Fischer, Asja
    Lehmann, Jens
    ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021, 2021, 12695 : 89 - 100