DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements

被引:0
|
作者
Liu, Chang [1 ,2 ]
Yang, Dongsheng [1 ]
Yu, Bihui [1 ]
Bu, Liping [1 ]
机构
[1] Chinese Acad Sci, Shenyang Inst Comp Technol, Shenyang 110016, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Multimodal named entity recognition; visually enhanced text representation; dynamic gates; correlation coefficient calculation;
D O I
10.1109/ACCESS.2024.3400250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal named entity recognition plays a crucial role in the construction process of knowledge graphs as it directly influences the quality of entity extraction and classification, which in turn affects the overall quality of knowledge graph construction. However, most existing multimodal named entity recognition algorithms do not consider the correlation between text and images. They either use visual features of images as the attention of the text modality or fuse them with textual features. In the case of multimodal tweets containing both text and images, three categories of data can be identified based on the correlation between the two: text that is related to images, text that is partially related to images, and text that is not related to images. Using irrelevant or partially relevant image features as text cross-modal attention can result in incorrect text representation, ultimately leading to misclassification of entities and negatively impacting the model's performance. To address the problem of uncertainty or negative impact caused by the lack of relevance or partial correlation between text and images, this paper proposes a visually enhanced text representation algorithm based on a hybrid of dynamic gating and correlation coefficient. We conducted experiments on two benchmark datasets, namely Twitter-2015 and Twitter-2017. The experimental results were analyzed comprehensively to showcase the strengths of the proposed model.
引用
收藏
页码:69151 / 69162
页数:12
相关论文
共 17 条
  • [1] Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
    Zhang, Dong
    Wei, Suzhong
    Li, Shoushan
    Wu, Hanqian
    Zhu, Qiaoming
    Zhou, Guodong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14347 - 14355
  • [2] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
    Yi, Feng
    Jiang, Bo
    Wang, Lu
    Wu, Jianjun
    IEEE ACCESS, 2020, 8 : 63214 - 63224
  • [3] MMAF: Masked Multi-modal Attention Fusion to Reduce Bias of Visual Features for Named Entity Recognition
    Jinhui Pang
    Xinyun Yang
    Xiaoyao Qiu
    Zixuan Wang
    Taisheng Huang
    Data Intelligence, 2024, 6 (04) : 1114 - 1133
  • [4] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
    Wang, Xinyu
    Gui, Min
    Jiang, Yong
    Jia, Zixia
    Bach, Nguyen
    Wang, Tao
    Huang, Zhongqiang
    Huang, Fei
    Tu, Kewei
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
  • [5] RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
    Wang, Min
    Chen, Hongbin
    Shen, Dingcai
    Li, Baolei
    Hu, Shiyu
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [6] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
    Na, Byeonghu
    Kim, Yoonsik
    Park, Sungrae
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 446 - 463
  • [7] GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion
    Hu, Huiyun
    Kong, Junda
    Wang, Fei
    Sun, Hongzhi
    Ge, Yang
    Xiao, Bo
    2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,
  • [8] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [9] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [10] Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition
    Lee, Min Kyu
    Kim, Dae Ha
    Song, Byung Cheol
    SENSORS, 2020, 20 (18) : 1 - 24