Review of Multimodal Named Entity Recognition Studies

被引:0
|
作者
Han P. [1 ,2 ]
Chen W. [1 ]
机构
[1] School of Management, Nanjing University of Posts and Telecommunications, Nanjing
[2] Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing University, Nanjing
关键词
Feature Representation; Multimodal Fusion; Multimodal Named Entity Recognition; Multimodal Pre-training;
D O I
10.11925/infotech.2096-3467.2023.0488
中图分类号
学科分类号
摘要
[Objective] This paper reviews multimodal named entity recognition research to provide references for future studies. [Coverage] We selected 83 representative papers using“multimodal named entity recognition”, “multimodal information extraction”, and“multimodal knowledge graph”as the search terms for the Web of Science, IEEE Xplore, ACM digital library, and CNKI databases. [Methods] We summarized the multimodal named entity recognition research in four aspects: concepts, feature representation, fusion strategies, and pretrained models. We also identified existing problems and future research directions. [Results] Multimodal named entity recognition studies focus on modal feature representation and fusion. It made some progress in the field of social media. They need to improve multimodal fine-grained feature extraction and semantic association mapping methods to enhance the models’generalization and interpretability. [Limitations] There is insufficient literature directly using multimodal named entity recognition as a research topic. [Conclusions] Our study provides new ideas to expand the applications of multimodal learning, break the modal barriers, and bridge the semantic gaps. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:50 / 63
页数:13
相关论文
共 83 条
  • [1] Moon S, Neves L, Carvalho V., Multimodal Named Entity Recognition for Short Social Media Posts, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, pp. 852-860, (2018)
  • [2] Zhang Q, Fu J L, Liu X Y, Et al., Adaptive Co-attention Network for Named Entity Recognition in Tweets, Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 5674-5681, (2018)
  • [3] Wu Youzheng, Li Haoran, Yao Ting, Et al., A Survey of Multimodal Information Processing Frontiers: Application, Fusion and Pretraining, Journal of Chinese Information Processing, 36, 5, pp. 1-20, (2022)
  • [4] Yao W, Yoshinaga N., Visually-Guided Named Entity Recognition by Grounding Words with Images via Dense Retrieval, Proceedings of the Association for Natural Language Processing, pp. 1361-1365, (2022)
  • [5] Elliott D, Frank S, Hasler E., Multilingual Image Description with Neural Sequence Models[OL]
  • [6] Antol S, Agrawal A, Lu J S, Et al., VQA: Visual Question Answering, Proceedings of 2015 IEEE International Conference on Computer Vision, pp. 2425-2433, (2015)
  • [7] Zhu X R, Li Z X, Wang X D, Et al., Multi-modal Knowledge Graph Construction and Application: A Survey, IEEE Transactions on Knowledge and Data Engineering, 36, 2, pp. 715-735, (2024)
  • [8] Baltrusaitis T, Ahuja C, Morency L P., Multimodal Machine Learning: A Survey and Taxonomy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2, pp. 423-443, (2019)
  • [9] Liang P P, Zadeh A, Morency L P., Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
  • [10] He Jun, Zhang Caiqing, Li Xiaozhen, Et al., Survey of Research on Multimodal Fusion Technology for Deep Learning, Computer Engineering, 46, 5, pp. 1-11, (2020)