A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [11] An Open Corpus for Named Entity Recognition in Historic Newspapers
    Neudecker, Clemens
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4348 - 4352
  • [12] MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
    Cheng, Xiao
    Wang, Weihua
    Bao, Feilong
    Gao, Guanglai
    MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 11 - 23
  • [13] GraphNER: Using Corpus Level Similarities and Graph Propagation for Named Entity Recognition
    Sheikhshab, Golnar
    Starks, Elizabeth
    Karsan, Aly
    Chiu, Readman
    Sarkar, Anoop
    Birol, Inanc
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 229 - 238
  • [14] Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis
    Jiang, Hang
    Hua, Yining
    Beeferman, Doug
    Roy, Deb
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7199 - 7208
  • [15] A Method of Named Entity Recognition for Tigrinya
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    APPLIED COMPUTING REVIEW, 2022, 22 (03): : 56 - 68
  • [16] A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition
    Li, Jiqiao
    Yuan, Chi
    Li, Zirui
    Wang, Huaiyu
    Tao, Feifei
    HEALTH INFORMATION PROCESSING, CHIP 2023, 2023, 1993 : 66 - 81
  • [17] A Broad-coverage Corpus for Finnish Named Entity Recognition
    Luoma, Jouni
    Oinonen, Miika
    Pyykonen, Maria
    Laippala, Veronika
    Pyysalo, Sampo
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4615 - 4624
  • [18] Assessment of disease named entity recognition on a corpus of annotated sentences
    Jimeno, Antonio
    Jimenez-Ruiz, Ernesto
    Lee, Vivian
    Gaudan, Sylvain
    Berlanga, Rafael
    Rebholz-Schuhmann, Dietrich
    BMC BIOINFORMATICS, 2008, 9 (Suppl 3)
  • [19] Assessment of disease named entity recognition on a corpus of annotated sentences
    Antonio Jimeno
    Ernesto Jimenez-Ruiz
    Vivian Lee
    Sylvain Gaudan
    Rafael Berlanga
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [20] GENETAG: a tagged corpus for gene/protein named entity recognition
    Lorraine Tanabe
    Natalie Xie
    Lynne H Thom
    Wayne Matten
    W John Wilbur
    BMC Bioinformatics, 6