A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies

被引:0
|
作者
Ngoc-Trinh Vu [1 ,2 ]
Van-Hien Tran [1 ]
Thi-Huyen-Trang Doan [1 ]
Hoang-Quynh Le [1 ]
Mai-Vu Tran [1 ]
机构
[1] Vietnam Natl Univ Hanoi, Univ Engn & Technol, Knowledge Technol Lab, Hanoi, Vietnam
[2] Vietnam Natl Oil & Gas Grp, Vietnam Petr Inst, Hanoi, Vietnam
关键词
Named entity recognition; Phenotype; Machine learning; Biomedical ontology;
D O I
10.1007/978-3-319-17996-4_13
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Building a labeled corpus which contains sufficient data and good coverage along with solving the problems of cost, effort and time is a popular research topic in natural language processing. The problem of constructing automatic or semi-automatic training data has become a matter of the research community. For this reason, we consider the problem of building a corpus in phenotype entity recognition problem, classs-specific feature detectors from unlabeled data based on over 10260 unique terms (more than 15000 synonyms) describing human phenotypic features in the Human Phenotype Ontology (HPO) and about 9000 unique terms (about 24000 synonyms) of mouse abnormal phenotype descriptions in the Mammalian Phenotype Ontology. This corpus evaluated on three corpora: Khordad corpus, Phenominer 2012 and Phenominer 2013 corpora with Maximum Entropy and Beam Search method. The performance is good for three corpora, with F-scores of 31.71% and 35.77% for Phenominer 2012 corpus and Phenominer 2013 corpus; 78.36% for Khordad corpus.
引用
收藏
页码:141 / 149
页数:9
相关论文
共 50 条
  • [41] Semi-Supervised Learning for Named Entity Recognition Using Weakly Labeled Training Data
    Zafarian, Atefeh
    Rokni, Ali
    Khadivi, Shahram
    Ghiasifard, Sonia
    2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 129 - 135
  • [42] Telugu named entity recognition using bert
    Gorla, SaiKiranmai
    Tangeda, Sai Sharan
    Neti, Lalita Bhanu Murthy
    Malapati, Aruna
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 14 (02) : 127 - 140
  • [43] Improving deep learning method for biomedical named entity recognition by using entity definition information
    Xiong, Ying
    Chen, Shuai
    Tang, Buzhou
    Chen, Qingcai
    Wang, Xiaolong
    Yan, Jun
    Zhou, Yi
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 1)
  • [44] Improving deep learning method for biomedical named entity recognition by using entity definition information
    Ying Xiong
    Shuai Chen
    Buzhou Tang
    Qingcai Chen
    Xiaolong Wang
    Jun Yan
    Yi Zhou
    BMC Bioinformatics, 22
  • [45] Semi-supervised geological disasters named entity recognition using few labeled data
    Xinya Lei
    Weijing Song
    Runyu Fan
    Ruyi Feng
    Lizhe Wang
    GeoInformatica, 2023, 27 : 263 - 288
  • [46] Named entity recognition by using maximum entropy
    SCSE, VIT University, Vellore, India
    Int. J. Database Theory Appl., 2 (43-50):
  • [47] Semi-supervised geological disasters named entity recognition using few labeled data
    Lei, Xinya
    Song, Weijing
    Fan, Runyu
    Feng, Ruyi
    Wang, Lizhe
    GEOINFORMATICA, 2023, 27 (02) : 263 - 288
  • [48] Telugu named entity recognition using bert
    SaiKiranmai Gorla
    Sai Sharan Tangeda
    Lalita Bhanu Murthy Neti
    Aruna Malapati
    International Journal of Data Science and Analytics, 2022, 14 : 127 - 140
  • [49] Pattern based bootstrapping method for named entity recognition
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 2007, : 349 - +
  • [50] An effective undersampling method for biomedical named entity recognition using machine learning
    Archana, S. M.
    Prakash, Jay
    EVOLVING SYSTEMS, 2024, 15 (04) : 1541 - 1549