ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

被引:0
|
作者
Wang, Xinyu [1 ,2 ,6 ]
Gui, Min [4 ,6 ]
Jiang, Yong [3 ]
Jia, Zixia [1 ,2 ]
Bach, Nguyen [5 ,6 ]
Wang, Tao
Huang, Zhongqiang [3 ]
Huang, Fei [3 ]
Tu, Kewei [1 ,2 ]
机构
[1] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
[2] Shanghai Engn Res Ctr Intelligent Vis & Imaging, Shanghai, Peoples R China
[3] Alibaba Grp, ADAM Acad, Hangzhou, Peoples R China
[4] Shopee, Singapore, Singapore
[5] Microsoft, Redmond, WA USA
[6] Alibaba Grp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lot of attention. Most of the work utilizes image information through region-level visual representations obtained from a pretrained object detector and relies on an attention mechanism to model the interactions between image and text representations. However, it is difficult to model such interactions as image and text representations are trained separately on the data of their respective modality and are not aligned in the same space. As text representations take the most important role in MNER, in this paper, we propose Image-text Alignments (ITA) to align image features into the textual space, so that the attention mechanism in transformerbased pretrained textual embeddings can be better utilized. ITA first aligns the image into regional object tags, image-level captions and optical characters as visual contexts, concatenates them with the input texts as a new crossmodal input, and then feeds it into a pretrained textual embedding model. This makes it easier for the attention module of a pretrained textual embedding model to model the interaction between the two modalities since they are both represented in the textual space. ITA further aligns the output distributions predicted from the cross-modal input and textual input views so that the MNER model can be more practical in dealing with text-only inputs and robust to noises from images. In our experiments, we show that ITA models can achieve state-ofthe-art accuracy on multi-modal Named Entity Recognition datasets, even without image information.(1)
引用
收藏
页码:3176 / 3189
页数:14
相关论文
共 50 条
  • [41] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
    Cheng, Bo
    Zhu, Jia
    Guo, Meimei
    NEUROCOMPUTING, 2022, 500 : 581 - 591
  • [42] Multi-modal haptic image recognition based on deep learning
    Han, Dong
    Nie, Hong
    Chen, Jinbao
    Chen, Meng
    Deng, Zhen
    Zhang, Jianwei
    SENSOR REVIEW, 2018, 38 (04) : 486 - 493
  • [43] PANORAMIC FACE AND EAR IMAGE STITCHING IN MULTI-MODAL RECOGNITION
    Li, Fang-Shi
    Mu, Zhi-Chun
    Chen, Long
    2014 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2014, : 81 - 86
  • [44] Cross-modal multi-relationship aware reasoning for image-text matching
    Jin Zhang
    Xiaohai He
    Linbo Qing
    Luping Liu
    Xiaodong Luo
    Multimedia Tools and Applications, 2022, 81 : 12005 - 12027
  • [45] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
  • [46] A Hybrid Named Entity Recognition System for Aviation Text
    Bharathi, A.
    Ramdin, Robin
    Babu, Preeja
    Menon, Vijay Krishna
    Jayaramakrishnan, Chandrasekhar
    Lakshmikumar, Sudarsan
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [47] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
  • [48] Named Entity Recognition for Russian Judicial Rulings Text
    Averina, Maria
    Levanova, Olga
    Kasatkina, Natalia
    2022 32ND CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2022, : 49 - 55
  • [49] Named Entity Recognition in Twitter Using Images and Text
    Esteves, Diego
    Peres, Rafael
    Lehmann, Jens
    Napolitano, Giulio
    CURRENT TRENDS IN WEB ENGINEERING, ICWE 2017, 2018, 10544 : 191 - 199
  • [50] Named Entity Recognition Method for Process Planning Text
    Dong H.
    Li Y.
    Qiao L.
    Huang Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (02): : 313 - 320