Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
|
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 50 条
  • [41] Chinese Named Entity Recognition in the Geoscience Domain Based on BERT
    Lv, Xia
    Xie, Zhong
    Xu, Dexin
    Jin, Xiangguo
    Ma, Kai
    Tao, Liufeng
    Qiu, Qinjun
    Pan, Yongsheng
    EARTH AND SPACE SCIENCE, 2022, 9 (03)
  • [42] Named Entity Recognition in Aviation Products Domain Based on BERT
    Yang, Mingye
    Namoano, Bernadin
    Farsi, Maryam
    Ahmet Erkoyuncu, John
    IEEE Access, 2024, 12 : 189710 - 189721
  • [43] Named Entity Recognition of Enterprise Annual Report Integrated with BERT
    Zhang J.
    He G.
    Dai Z.
    Liu Y.
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2021, 55 (02): : 117 - 123
  • [44] Based on BERT-wwm for Agricultural Named Entity Recognition
    Huang, Qiang
    Tao, Youzhi
    Wu, Zongyuan
    Marinello, Francesco
    AGRONOMY-BASEL, 2024, 14 (06):
  • [45] Deep Exhaustive Model for Nested Named Entity Recognition
    Sohrab, Mohammad Golam
    Miwa, Makoto
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2843 - 2849
  • [46] A Boundary Regression Model for Nested Named Entity Recognition
    Yanping Chen
    Lefei Wu
    Qinghua Zheng
    Ruizhang Huang
    Jun Liu
    Liyuan Deng
    Junhui Yu
    Yongbin Qing
    Bo Dong
    Ping Chen
    Cognitive Computation, 2023, 15 : 534 - 551
  • [47] Candidate region aware nested named entity recognition
    Jiang, Deng
    Ren, Haopeng
    Cai, Yi
    Xu, Jingyun
    Liu, Yanxia
    Leung, Ho-fung
    NEURAL NETWORKS, 2021, 142 : 340 - 350
  • [48] Hybrid Named Entity Recognition - Application to Arabic Language
    Meselhi, Mohamed A.
    Bakr, Hitham M. Abo
    Ziedan, Ibrahim
    Shaalan, Khaled
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 80 - 85
  • [49] Arabic Named Entity Recognition-A Survey and Analysis
    Dandashi, Amal
    Al Jaam, Jihad
    Foufou, Sebti
    INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES 2016, 2016, 55 : 83 - 96
  • [50] A Boundary Regression Model for Nested Named Entity Recognition
    Chen, Yanping
    Wu, Lefei
    Zheng, Qinghua
    Huang, Ruizhang
    Liu, Jun
    Deng, Liyuan
    Yu, Junhui
    Qing, Yongbin
    Dong, Bo
    Chen, Ping
    COGNITIVE COMPUTATION, 2023, 15 (02) : 534 - 551