ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

被引：0

作者：

Wang, Xinyu ^{[1
,2
,6
]}

Gui, Min ^{[4
,6
]}

Jiang, Yong ^{[3
]}

Jia, Zixia ^{[1
,2
]}

Bach, Nguyen ^{[5
,6
]}

Wang, Tao

Huang, Zhongqiang ^{[3
]}

Huang, Fei ^{[3
]}

Tu, Kewei ^{[1
,2
]}

机构：

[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China

[2] Shanghai Engn Res Ctr Intelligent Vis & Imaging, Shanghai, Peoples R China

[3] Alibaba Grp, ADAM Acad, Hangzhou, Peoples R China

[4] Shopee, Singapore, Singapore

[5] Microsoft, Redmond, WA USA

[6] Alibaba Grp, Hangzhou, Peoples R China

来源：

NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lot of attention. Most of the work utilizes image information through region-level visual representations obtained from a pretrained object detector and relies on an attention mechanism to model the interactions between image and text representations. However, it is difficult to model such interactions as image and text representations are trained separately on the data of their respective modality and are not aligned in the same space. As text representations take the most important role in MNER, in this paper, we propose Image-text Alignments (ITA) to align image features into the textual space, so that the attention mechanism in transformerbased pretrained textual embeddings can be better utilized. ITA first aligns the image into regional object tags, image-level captions and optical characters as visual contexts, concatenates them with the input texts as a new crossmodal input, and then feeds it into a pretrained textual embedding model. This makes it easier for the attention module of a pretrained textual embedding model to model the interaction between the two modalities since they are both represented in the textual space. ITA further aligns the output distributions predicted from the cross-modal input and textual input views so that the MNER model can be more practical in dealing with text-only inputs and robust to noises from images. In our experiments, we show that ITA models can achieve state-ofthe-art accuracy on multi-modal Named Entity Recognition datasets, even without image information.(1)

引用

页码：3176 / 3189

页数：14

共 50 条

[41] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
Cheng, Bo
Zhu, Jia
Guo, Meimei
NEUROCOMPUTING, 2022, 500 : 581 - 591
[42] Multi-modal haptic image recognition based on deep learning
Han, Dong
Nie, Hong
Chen, Jinbao
Chen, Meng
Deng, Zhen
Zhang, Jianwei
SENSOR REVIEW, 2018, 38 (04) : 486 - 493
[43] PANORAMIC FACE AND EAR IMAGE STITCHING IN MULTI-MODAL RECOGNITION
Li, Fang-Shi
Mu, Zhi-Chun
Chen, Long
2014 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2014, : 81 - 86
[44] Cross-modal multi-relationship aware reasoning for image-text matching
Jin Zhang
Xiaohai He
Linbo Qing
Luping Liu
Xiaodong Luo
Multimedia Tools and Applications, 2022, 81 : 12005 - 12027
[45] Nested named entity recognition in historical archive text
Byrne, Kate
ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
[46] A Hybrid Named Entity Recognition System for Aviation Text
Bharathi, A.
Ramdin, Robin
Babu, Preeja
Menon, Vijay Krishna
Jayaramakrishnan, Chandrasekhar
Lakshmikumar, Sudarsan
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
[47] Named Entity Recognition in Unstructured Medical Text Documents
Pearson, Cole
Seliya, Naeem
Dave, Rushit
INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
[48] Named Entity Recognition for Russian Judicial Rulings Text
Averina, Maria
Levanova, Olga
Kasatkina, Natalia
2022 32ND CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2022, : 49 - 55
[49] Named Entity Recognition in Twitter Using Images and Text
Esteves, Diego
Peres, Rafael
Lehmann, Jens
Napolitano, Giulio
CURRENT TRENDS IN WEB ENGINEERING, ICWE 2017, 2018, 10544 : 191 - 199
[50] Named Entity Recognition Method for Process Planning Text
Dong H.
Li Y.
Qiao L.
Huang Z.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (02): : 313 - 320

← 1 2 3 4 5 →