GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion

被引:0
|
作者
Hu, Huiyun [1 ,2 ]
Kong, Junda [1 ,2 ]
Wang, Fei [2 ]
Sun, Hongzhi [2 ]
Ge, Yang [2 ]
Xiao, Bo [1 ]
机构
[1] Dezhou Power Supply Co, State Grid Shandong Elect Power Co, Dezhou, Peoples R China
[2] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1109/APSIPAASC63619.2025.10848846
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal Named Entity Recognition (MNER) leverages visual information to enhance the effectiveness of text-only Named Entity Recognition (NER). Currently, many methods are based on sequence label, but the model architecture is relatively complex. Large Language Model (LLM) has recently demonstrated powerful generative and comprehension abilities, so we propose GMNER-LF, a new paradigm for MNER with generative method. Firstly, we retrieve relevant image-text pairs to provide prior knowledge for the recognition. Secondly, we construct our task into a MRC task so that the LLM can better understand the problem. In addition, we design a multi-modal fusion module and add a gating mechanism to help filter noise information in the image to obtain high-quality fusion representations. The multi-modal fusion module is injected into the LLM block to achieve deep fusion of LLM and multi-modal representations and fully explore the internal knowledge of LLM. The proposed method not only fully explores the internal knowledge of LLM, but also filters important modal information through the gating mechanism. Experimental results show that compared with other generative methods, the proposed method has improved performance on both Twitter-2015 and Twitter-2017 datasets.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
    Zhang, Dong
    Wei, Suzhong
    Li, Shoushan
    Wu, Hanqian
    Zhu, Qiaoming
    Zhou, Guodong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14347 - 14355
  • [2] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
    Yi, Feng
    Jiang, Bo
    Wang, Lu
    Wu, Jianjun
    IEEE ACCESS, 2020, 8 : 63214 - 63224
  • [3] MMAF: Masked Multi-modal Attention Fusion to Reduce Bias of Visual Features for Named Entity Recognition
    Jinhui Pang
    Xinyun Yang
    Xiaoyao Qiu
    Zixuan Wang
    Taisheng Huang
    Data Intelligence, 2024, 6 (04) : 1114 - 1133
  • [4] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
    Wang, Xinyu
    Gui, Min
    Jiang, Yong
    Jia, Zixia
    Bach, Nguyen
    Wang, Tao
    Huang, Zhongqiang
    Huang, Fei
    Tu, Kewei
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
  • [5] Named Entity Recognition of Diseases and Insect Pests Based on Multi Source Information Fusion
    Li L.
    Zhou H.
    Guo X.
    Liu C.
    Su J.
    Tang Z.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2021, 52 (12): : 253 - 263
  • [6] A LLM-Based Robot Partner with Multi-modal Emotion Recognition
    Jiang, Yutong
    Shao, Shuai
    Dai, Yaping
    Hirota, Kaoru
    INTELLIGENT ROBOTICS AND APPLICATIONS, ICIRA 2024, PT X, 2025, 15210 : 71 - 83
  • [7] RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
    Wang, Min
    Chen, Hongbin
    Shen, Dingcai
    Li, Baolei
    Hu, Shiyu
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [8] 3D shape recognition based on multi-modal information fusion
    Qi Liang
    Mengmeng Xiao
    Dan Song
    Multimedia Tools and Applications, 2021, 80 : 16173 - 16184
  • [9] 3D shape recognition based on multi-modal information fusion
    Liang, Qi
    Xiao, Mengmeng
    Song, Dan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16173 - 16184
  • [10] Generative-Based Fusion Mechanism for Multi-Modal Tracking
    Tang, Zhangyong
    Xu, Tianyang
    Wu, Xiaojun
    Zhu, Xue-Feng
    Kittler, Josef
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5189 - 5197