A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [1] Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)
    Salah, Ramzi Esmail
    Zakaria, Lailatul Qadri Binti
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 150 - 157
  • [2] Building a Corpus-Derived Gazetteer for Named Entity Recognition
    Zamin, Norshuhani
    Oxley, Alan
    SOFTWARE ENGINEERING AND COMPUTER SYSTEMS, PT 2, 2011, 180 : 73 - 80
  • [3] Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
    Suriyachay, Kitiya
    Sornlertlamvanich, Virach
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 30 - 35
  • [4] A Twitter Corpus for Named Entity Recognition in Turkish
    Carik, Buse
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
  • [5] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [6] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [7] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [8] Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
    Jarrar, Mustafa
    Khalilia, Mohammed
    Ghanem, Sana
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3626 - 3636
  • [9] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [10] Arabic Named Entity Recognition Using Boosting Method
    Sajadi, Mohamad Bagher
    Minaei, Behrooz
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 281 - 288