Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

被引:4
|
作者
Han, Xu [1 ]
Kim, Jung-jae [2 ]
Kwoh, Chee Keong [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[2] Inst Infocomm Res, Data Analyt Dept, 1Fusionopolis Way, Singapore 138632, Singapore
来源
关键词
Active learning; Biomedical natural language processing; Information extraction;
D O I
10.1186/s13326-016-0059-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Methods: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. Results and conclusion: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Active learning for ontological event extraction incorporating named entity recognition and unknown word handling
    Xu Han
    Jung-jae Kim
    Chee Keong Kwoh
    [J]. Journal of Biomedical Semantics, 7
  • [2] Incorporating word⁃set attention into Chinese named entity recognition Method
    Zhong, Shi-Sheng
    Chen, Xi
    Zhao, Ming-Hang
    Zhang, Yong-Jian
    [J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (05): : 1098 - 1105
  • [3] Joint Learning of Named Entity Recognition and Relation Extraction
    Xu, Qiuyan
    Li, Fang
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 1978 - 1982
  • [4] Active Learning Technique for Biomedical Named Entity Extraction
    Saha, Sriparna
    Ekbal, Asif
    Verma, Mridula
    Sikdar, Utpal
    Poesio, Massimo
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 835 - 841
  • [5] Domain Adaptation with Active Learning for Named Entity Recognition
    Sun, Huiyu
    Grishman, Ralph
    Wang, Yingchao
    [J]. CLOUD COMPUTING AND SECURITY, ICCCS 2016, PT II, 2016, 10040 : 611 - 622
  • [6] Adversarial Active Learning for Named Entity Recognition in Cybersecurity
    Li, Tao
    Hu, Yongjin
    Ju, Ankang
    Hu, Zhuoran
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 407 - 420
  • [7] Active Machine Learning Technique For Named Entity Recognition
    Ekbal, Asif
    Saha, Sriparna
    Singh, Dhirendra
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 180 - 186
  • [8] Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
    Tsendsuren Munkhdalai
    Meijing Li
    Khuyagbaatar Batsuren
    Hyeon Ah Park
    Nak Hyeon Choi
    Keun Ho Ryu
    [J]. Journal of Cheminformatics, 7
  • [9] Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
    Munkhdalai, Tsendsuren
    Li, Meijing
    Batsuren, Khuyagbaatar
    Park, Hyeon Ah
    Choi, Nak Hyeon
    Ryu, Keun Ho
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [10] Named Entity Recognition and Event Extraction in Chinese Electronic Medical Records
    Ma, Cheng
    Huang, Wenkang
    [J]. CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 133 - 138