Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
|
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 50 条
  • [1] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [2] Arabic Named Entity Recognition: A BERT-BGRU Approach
    Alsaaran, Norah
    Alrabiah, Maha
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 471 - 485
  • [3] Telugu named entity recognition using bert
    Gorla, SaiKiranmai
    Tangeda, Sai Sharan
    Neti, Lalita Bhanu Murthy
    Malapati, Aruna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 127 - 140
  • [4] Telugu named entity recognition using bert
    SaiKiranmai Gorla
    Sai Sharan Tangeda
    Lalita Bhanu Murthy Neti
    Aruna Malapati
    International Journal of Data Science and Analytics, 2022, 14 : 127 - 140
  • [5] Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)
    Salah, Ramzi Esmail
    Zakaria, Lailatul Qadri Binti
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 150 - 157
  • [6] Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT
    Alsaaran, Norah
    Alrabiah, Maha
    IEEE ACCESS, 2021, 9 : 91537 - 91547
  • [7] Arabic Named Entity Recognition
    Benajiba, Yassine
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 151 - 152
  • [8] Named entity recognition for Arabic using syntactic grammars
    Mesfar, Slim
    Natural Language Processing and Information Systems, Proceedings, 2007, 4592 : 305 - 316
  • [9] Arabic Named Entity Recognition Using Boosting Method
    Sajadi, Mohamad Bagher
    Minaei, Behrooz
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 281 - 288
  • [10] ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
    Boudjellal, Nada
    Zhang, Huaping
    Khan, Asif
    Ahmad, Arshad
    Naseem, Rashid
    Shang, Jianyun
    Dai, Lin
    COMPLEXITY, 2021, 2021