Review of Multimodal Named Entity Recognition Studies

被引：0

作者：

Han P. ^{[1
,2
]}

Chen W. ^{[1
]}

机构：

[1] School of Management, Nanjing University of Posts and Telecommunications, Nanjing

[2] Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing University, Nanjing

来源：

Data Analysis and Knowledge Discovery | 2024年 / 8卷 / 04期

关键词：

Feature Representation; Multimodal Fusion; Multimodal Named Entity Recognition; Multimodal Pre-training;

D O I：

10.11925/infotech.2096-3467.2023.0488

中图分类号：

学科分类号：

摘要：

[Objective] This paper reviews multimodal named entity recognition research to provide references for future studies. [Coverage] We selected 83 representative papers using“multimodal named entity recognition”, “multimodal information extraction”, and“multimodal knowledge graph”as the search terms for the Web of Science, IEEE Xplore, ACM digital library, and CNKI databases. [Methods] We summarized the multimodal named entity recognition research in four aspects: concepts, feature representation, fusion strategies, and pretrained models. We also identified existing problems and future research directions. [Results] Multimodal named entity recognition studies focus on modal feature representation and fusion. It made some progress in the field of social media. They need to improve multimodal fine-grained feature extraction and semantic association mapping methods to enhance the models’generalization and interpretability. [Limitations] There is insufficient literature directly using multimodal named entity recognition as a research topic. [Conclusions] Our study provides new ideas to expand the applications of multimodal learning, break the modal barriers, and bridge the semantic gaps. © 2024 Chinese Academy of Sciences. All rights reserved.

引用

页码：50 / 63

页数：13

共 83 条

[41] Ren S Q, He K M, Girshick R, Et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
[42] Kiela D, Bottou L., Learning Image Embeddings Using Convolutional Neural Networks for Improved Multi-Modal Semantics, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 36-45, (2014)
[43] Anderson P, He X D, Buehler C, Et al., Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [C], Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6077-6086, (2018)
[44] Dong L H, Xu S, Xu B., Speech-Transformer: A No-recurrence Sequence-to-Sequence Model for Speech Recognition, Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5884-5888, (2018)
[45] Purwins H, Li B, Virtanen T, Et al., Deep Learning for Audio Signal Processing, IEEE Journal of Selected Topics in Signal Processing, 13, 2, pp. 206-219, (2019)
[46] Hu Fengsong, Zhang Xuan, Speaker Recognition Method Based on Mel Frequency Cepstrum Coefficient and Inverted Mel Frequency Cepstrum Coefficient, Journal of Computer Applications, 32, 9, pp. 2542-2544, (2012)
[47] Zhang X, Yuan J L, Li L, Et al., Reducing the Bias of Visual Objects in Multimodal Named Entity Recognition, Proceedings of the 16th ACM International Conference on Web Search and Data Mining, pp. 958-966, (2023)
[48] Liu P P, Li H, Ren Y M, Et al., A Novel Framework for Multimodal Named Entity Recognition with Multi-level Alignments
[49] Khare Y, Bagal V, Mathew M, Et al., MMBERT: Multimodal BERT Pretraining for Improved Medical VQA[C], Proceedings of 2021 IEEE 18th International Symposium on Biomedical Imaging, pp. 1033-1036, (2021)
[50] Jiang Y G, Wu Z X, Wang J, Et al., Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 2, pp. 352-364, (2018)

← 1 2 3 4 5 6 7 8 9 →