GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion

被引:0
|
作者
Hu, Huiyun [1 ,2 ]
Kong, Junda [1 ,2 ]
Wang, Fei [2 ]
Sun, Hongzhi [2 ]
Ge, Yang [2 ]
Xiao, Bo [1 ]
机构
[1] Dezhou Power Supply Co, State Grid Shandong Elect Power Co, Dezhou, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1109/APSIPAASC63619.2025.10848846
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal Named Entity Recognition (MNER) leverages visual information to enhance the effectiveness of text-only Named Entity Recognition (NER). Currently, many methods are based on sequence label, but the model architecture is relatively complex. Large Language Model (LLM) has recently demonstrated powerful generative and comprehension abilities, so we propose GMNER-LF, a new paradigm for MNER with generative method. Firstly, we retrieve relevant image-text pairs to provide prior knowledge for the recognition. Secondly, we construct our task into a MRC task so that the LLM can better understand the problem. In addition, we design a multi-modal fusion module and add a gating mechanism to help filter noise information in the image to obtain high-quality fusion representations. The multi-modal fusion module is injected into the LLM block to achieve deep fusion of LLM and multi-modal representations and fully explore the internal knowledge of LLM. The proposed method not only fully explores the internal knowledge of LLM, but also filters important modal information through the gating mechanism. Experimental results show that compared with other generative methods, the proposed method has improved performance on both Twitter-2015 and Twitter-2017 datasets.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Multi-Modal Fusion Emotion Recognition Based on HMM and ANN
    Xu, Chao
    Cao, Tianyi
    Feng, Zhiyong
    Dong, Caichao
    CONTEMPORARY RESEARCH ON E-BUSINESS TECHNOLOGY AND STRATEGY, 2012, 332 : 541 - 550
  • [22] Geological Body Recognition Based on Multi-Modal Feature Fusion
    Fu S.
    Li C.
    Zhang H.
    Liu C.
    Li F.
    Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences, 2023, 48 (10): : 3743 - 3752
  • [23] MIFM: Multi-Granularity Information Fusion Model for Chinese Named Entity Recognition
    Zhang, Naixin
    Xu, Guangluan
    Zhang, Zequen
    Li, Feng
    IEEE ACCESS, 2019, 7 : 181648 - 181655
  • [24] On Multi-modal Fusion for Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
  • [25] Visual Sorting Method Based on Multi-Modal Information Fusion
    Han, Song
    Liu, Xiaoping
    Wang, Gang
    APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [26] News video classification based on multi-modal information fusion
    Lie, WN
    Su, CK
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 1021 - 1024
  • [27] Multi-Modal Fusion Technology Based on Vehicle Information: A Survey
    Zhang, Xinyu
    Gong, Yan
    Lu, Jianli
    Wu, Jiayi
    Li, Zhiwei
    Jin, Dafeng
    Li, Jun
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (06): : 3605 - 3619
  • [28] Named entity recognition in aerospace based on multi-feature fusion transformer
    Jing Chu
    Yumeng Liu
    Qi Yue
    Zixuan Zheng
    Xiaokai Han
    Scientific Reports, 14
  • [29] Biomedical named entity recognition based on fusion multi-features embedding
    Li, Meijing
    Yang, Hao
    Liu, Yuxin
    TECHNOLOGY AND HEALTH CARE, 2023, 31 : S111 - S121
  • [30] Named entity recognition in aerospace based on multi-feature fusion transformer
    Chu, Jing
    Liu, Yumeng
    Yue, Qi
    Zheng, Zixuan
    Han, Xiaokai
    SCIENTIFIC REPORTS, 2024, 14 (01)