Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
|
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 50 条
  • [31] Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain
    Zhou, Shicheng
    Liu, Jingju
    Zhong, Xiaofeng
    Zhao, Wendian
    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2021), 2021, : 316 - 320
  • [32] Biomedical named entity recognition using BERT in the machine reading comprehension framework
    Sun, Cong
    Yang, Zhihao
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 118
  • [33] A Chinese nested named entity recognition approach using sequence labeling
    Chen, Maojian
    Luo, Xiong
    Shen, Hailun
    Huang, Ziyang
    Peng, Qiaojuan
    Yuan, Yuqi
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2023, 19 (01) : 42 - 60
  • [34] A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies
    Ngoc-Trinh Vu
    Van-Hien Tran
    Thi-Huyen-Trang Doan
    Hoang-Quynh Le
    Mai-Vu Tran
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 141 - 149
  • [35] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [36] Chinese mineral named entity recognition based on BERT model
    Yu, Yuqing
    Wang, Yuzhu
    Mua, Jingqin
    Li, Wei
    Jiao, Shoutao
    Wang, Zhenhua
    Lv, Pengfei
    Zhu, Yueqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [37] A Bidirectional Iterative Algorithm for Nested Named Entity Recognition
    Dadas, Slawomir
    Protasiewicz, Jaroslaw
    IEEE ACCESS, 2020, 8 (08): : 135091 - 135102
  • [38] Few-shot nested named entity recognition
    Ming, Hong
    Yang, Jiaoyun
    Gui, Fang
    Jiang, Lili
    An, Ning
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [39] Hierarchical Region Learning for Nested Named Entity Recognition
    Long, Xinwei
    Niu, Shuzi
    Li, Yucheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4788 - 4793
  • [40] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596