Cross-Lingual Named Entity Recognition for Heterogenous Languages

被引:1
|
作者
Fu, Yingwen [1 ]
Lin, Nankai [2 ]
Chen, Boyu [3 ]
Yang, Ziyu [1 ]
Jiang, Shengyi [1 ,4 ]
机构
[1] Guangdong Univ Foreign Studies, Sch Informat Sci & Technol, Guangzhou 510006, Guangdong, Peoples R China
[2] Guangdong Univ Technol, Sch Comp Sci & Technol, Guangzhou 510006, Guangdong, Peoples R China
[3] UCL, Inst Hlth Informat, London WC1E 6BT, England
[4] Guangdong Univ Foreign Studies, Guangzhou Key Lab Multilingual Intelligent Proc, Guangzhou 510006, Guangdong, Peoples R China
关键词
Training; Data models; Standards; Speech processing; Optimization; Knowledge transfer; Information science; Cross-lingual named entity recognition; heterogenous language; weakly supervised learning; bilateral-branch network; self-distillation;
D O I
10.1109/TASLP.2022.3212698
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Previous works on cross-lingual Named Entity Recognition (NER) have achieved great success. However, few of them consider the effect of language families between the source and target languages. In this study, we find that the cross-lingual NER performance of a target language would decrease when its source language is changed from the same (homogenous) into a different (heterogenous) language family with that target language. To improve the NER performance in this situation, we propose a novel cross-lingual NER framework based on self-distillation mechanism and Bilateral-Branch Network (SD-BBN). SD-BBN learns source-language NER knowledge from supervised datasets and obtains target-language knowledge from weakly supervised datasets. These two kinds of knowledge are then fused based on self-distillation mechanism for better identifying entities in the target language. We evaluate SD-BBN on 9 language datasets from 4 different language families. Results show that SD-BBN tends to outperform baseline methods. Remarkably, when the target and source languages are heterogenous, SD-BBN can achieve a greater boost. Our results might suggest that obtaining language-specific knowledge from the target language is essential for improving cross-lingual NER when the source and target languages are heterogenous. This finding could provide a novel insight into further research.
引用
下载
收藏
页码:371 / 382
页数:12
相关论文
共 50 条
  • [41] Choosing Transfer Languages for Cross-Lingual Learning
    Lin, Yu-Hsiang
    Chen, Chian-Yu
    Lee, Jean
    Li, Zirui
    Zhang, Yuyan
    Xia, Mengzhou
    Rijhwani, Shruti
    He, Junxian
    Zhang, Zhisong
    Ma, Xuezhe
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    Anastasopoulos, Antonios
    Littell, Patrick
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3125 - 3135
  • [42] Cross-Lingual Word Embeddings for Turkic Languages
    Kuriyozov, Elmurod
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4054 - 4062
  • [43] MasakhaNER: Named Entity Recognition for African Languages
    Adelani, David Ifeoluwa
    Abbott, Jade
    Neubig, Graham
    D'souza, Daniel
    Kreutzer, Julia
    Lignos, Constantine
    Palen-Michel, Chester
    Buzaaba, Happy
    Rijhwani, Shruti
    Ruder, Sebastian
    Mayhew, Stephen
    Azime, Israel Abebe
    Muhammad, Shamsuddeen H.
    Emezue, Chris Chinenye
    Nakatumba-Nabende, Joyce
    Ogayo, Perez
    Anuoluwapo, Aremu
    Gitau, Catherine
    Mbaye, Derguene
    Alabi, Jesujoba
    Yimam, Seid Muhie
    Gwadabe, Tajuddeen Rabiu
    Ezeani, Ignatius
    Niyongabo, Rubungo Andre
    Mukiibi, Jonathan
    Otiende, Verrah
    Orife, Iroro
    David, Davis
    Ngom, Samba
    Adewumi, Tosin
    Rayson, Paul
    Adeyemi, Mofetoluwa
    Muriuki, Gerald
    Anebi, Emmanuel
    Chukwuneke, Chiamaka
    Odu, Nkiruka
    Wairagala, Eric Peter
    Oyerinde, Samuel
    Siro, Clemencia
    Bateesa, Tobius Saul
    Oloyede, Temilola
    Wambui, Yvonne
    Akinode, Victor
    Nabagereka, Deborah
    Katusiime, Maurice
    Awokoya, Ayodele
    Mboup, Mouhamadane
    Gebreyohannes, Dibora
    Tilaye, Henok
    Nwaike, Kelechi
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1116 - 1131
  • [44] Name Entity Recognition for Malay Texts Using Cross-Lingual Annotation Projection Approach
    Zamin, Norshuhani
    Abu Bakar, Zainab
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2015, PT I, 2015, 9155 : 242 - 256
  • [45] Cross-Lingual Name Entity Recognition from Clinical Text UsingMixed Language Query
    Shi, Kunli
    Chen, Gongchi
    Gu, Jinghang
    Qian, Longhua
    Zhou, Guodong
    HEALTH INFORMATION PROCESSING, CHIP 2023, 2023, 1993 : 3 - 21
  • [46] Adaptive Entity Alignment for Cross-Lingual Knowledge Graph
    Zhang, Yuanming
    Gao, Tianyu
    Lu, Jiawei
    Cheng, Zhenbo
    Xiao, Gang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 474 - 487
  • [47] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [48] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, : 174 - 182
  • [49] Cross-lingual entity matching and infobox alignment in Wikipedia
    Rinser, Daniel
    Lange, Dustin
    Naumann, Felix
    INFORMATION SYSTEMS, 2013, 38 (06) : 887 - 907
  • [50] Cross-Lingual Entity Matching for Heterogeneous Online Wikis
    Lu, Weiming
    Wang, Peng
    Wang, Huan
    Liu, Jiahui
    Dai, Hao
    Wei, Baogang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 887 - 899