Language Clustering for Multilingual Named Entity Recognition

被引:0
|
作者
Shaffer, Kyle [1 ]
机构
[1] Language Weaver RWS Grp, Gerrards Cross, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work in multilingual natural language processing has shown progress in various tasks such as natural language inference and joint multilingual translation. Despite success in learning across many languages, challenges arise where multilingual training regimes often boost performance on some languages at the expense of others. For multilingual named entity recognition (NER) we propose a simple technique that groups similar languages together by using embeddings from a pre-trained masked language model, and automatically discovering language clusters in this embedding space. Specifically, we fine-tune an XLM-Roberta model on a language identification task, and use embeddings from this model for clustering. We conduct experiments on 15 diverse languages in the WikiAnn dataset and show our technique largely outperforms three baselines: (1) training a multilingual model jointly on all available languages, (2) training one monolingual model per language, and (3) grouping languages by linguistic family. We also conduct analyses showing meaningful multilingual transfer for low-resource languages (Swahili and Yoruba), despite being automatically grouped with other seemingly disparate languages.
引用
收藏
页码:40 / 45
页数:6
相关论文
共 50 条
  • [21] Named Entity Recognition System for Sindhi Language
    Jumani, Awais Khan
    Memon, Mashooque Ahmed
    Khoso, Fida Hussain
    Sanjrani, Anwar Ali
    Soomro, Safeeullah
    EMERGING TECHNOLOGIES IN COMPUTING, ICETIC 2018, 2018, 200 : 237 - 246
  • [22] A Named Entity Recognition System for the Marathi Language
    Vaishali, P. Kadam
    Mahender, Namrata
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 229 - 243
  • [23] A LANGUAGE INDEPENDENT NAMED ENTITY RECOGNITION SYSTEM
    Gifu, Daniela
    Vasilache, Gabriela
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE 'LINQUISTIC RESOURCES AND TOOLS FOR PROCESSING THE ROMANIAN LANGUAGE', 2014, 2014, : 181 - 188
  • [24] Named Entity Recognition and Classification for Gujarati Language
    Vora, Komil
    Vasant, Avani
    Adhvaryu, Rachit
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2269 - 2272
  • [25] Named entity recognition for Hindi language : A survey
    Sharma, Richa
    Morwal, Sudha
    Agarwal, Basant
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (04): : 569 - 580
  • [26] FEATURES FOR NAMED ENTITY RECOGNITION IN CZECH LANGUAGE
    Kral, Pavel
    KEOD 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND ONTOLOGY DEVELOPMENT, 2011, : 437 - 441
  • [27] Named Entity Recognition: a Survey for the Portuguese Language
    Albuquerque, Hidelberg O.
    Souza, Ellen
    Gomes, Carlos
    Pinto, Matheus Henrique de C.
    Filho, Ricardo P. S.
    Costa, Rosimeire
    Lopes, Vinicius Teixeira de M.
    da Silva, Nadia F. F.
    de Carvalho, Andre C. P. L. F.
    Oliveira, Adriano L. I.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 171 - 185
  • [28] A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
    Hamdi, Ahmed
    Pontes, Elvys Linhares
    Boros, Emanuela
    Thi Tuyet Hai Nguyen
    Hackl, Guenter
    Moreno, Jose G.
    Doucet, Antoine
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2328 - 2334
  • [29] TLR at BSNLP2019: A Multilingual Named Entity Recognition System
    Moreno, Jose G.
    Pontes, Elvys Linhares
    Coustaty, Mickael
    Doucet, Antoine
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 83 - 88
  • [30] Firefly Algorithm Based Multilingual Named Entity Recognition for Indian Languages
    Biswas, Sitanath
    Dash, Sujata
    Acharya, Sweta
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 540 - 552