DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements

被引：0

作者：

Liu, Chang ^{[1
,2
]}

Yang, Dongsheng ^{[1
]}

Yu, Bihui ^{[1
]}

Bu, Liping ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenyang Inst Comp Technol, Shenyang 110016, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Multimodal named entity recognition; visually enhanced text representation; dynamic gates; correlation coefficient calculation;

D O I：

10.1109/ACCESS.2024.3400250

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal named entity recognition plays a crucial role in the construction process of knowledge graphs as it directly influences the quality of entity extraction and classification, which in turn affects the overall quality of knowledge graph construction. However, most existing multimodal named entity recognition algorithms do not consider the correlation between text and images. They either use visual features of images as the attention of the text modality or fuse them with textual features. In the case of multimodal tweets containing both text and images, three categories of data can be identified based on the correlation between the two: text that is related to images, text that is partially related to images, and text that is not related to images. Using irrelevant or partially relevant image features as text cross-modal attention can result in incorrect text representation, ultimately leading to misclassification of entities and negatively impacting the model's performance. To address the problem of uncertainty or negative impact caused by the lack of relevance or partial correlation between text and images, this paper proposes a visually enhanced text representation algorithm based on a hybrid of dynamic gating and correlation coefficient. We conducted experiments on two benchmark datasets, namely Twitter-2015 and Twitter-2017. The experimental results were analyzed comprehensively to showcase the strengths of the proposed model.

引用

页码：69151 / 69162

页数：12

共 17 条

[1] Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance
Zhang, Dong
Wei, Suzhong
Li, Shoushan
Wu, Hanqian
Zhu, Qiaoming
Zhou, Guodong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14347 - 14355
[2] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
Yi, Feng
Jiang, Bo
Wang, Lu
Wu, Jianjun
IEEE ACCESS, 2020, 8 : 63214 - 63224
[3] MMAF: Masked Multi-modal Attention Fusion to Reduce Bias of Visual Features for Named Entity Recognition
Jinhui Pang
Xinyun Yang
Xiaoyao Qiu
Zixuan Wang
Taisheng Huang
Data Intelligence, 2024, 6 (04) : 1114 - 1133
[4] ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Wang, Xinyu
Gui, Min
Jiang, Yong
Jia, Zixia
Bach, Nguyen
Wang, Tao
Huang, Zhongqiang
Huang, Fei
Tu, Kewei
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3176 - 3189
[5] RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
Wang, Min
Chen, Hongbin
Shen, Dingcai
Li, Baolei
Hu, Shiyu
PEERJ COMPUTER SCIENCE, 2024, 10
[6] Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features
Na, Byeonghu
Kim, Yoonsik
Park, Sungrae
COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 446 - 463
[7] GMNER-LF: Generative Multi-modal Named Entity Recognition Based on LLM with Information Fusion
Hu, Huiyun
Kong, Junda
Wang, Fei
Sun, Hongzhi
Ge, Yang
Xiao, Bo
2024 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2024,
[8] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
Chen Guanghui
Zeng Xiaoping
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
[9] Audio-Visual Emotion Recognition System Using Multi-Modal Features
Handa, Anand
Agarwal, Rashi
Kohli, Narendra
INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
[10] Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition
Lee, Min Kyu
Kim, Dae Ha
Song, Byung Cheol
SENSORS, 2020, 20 (18) : 1 - 24

← 1 2 →