Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

被引:4
|
作者
Han, Xu [1 ]
Kim, Jung-jae [2 ]
Kwoh, Chee Keong [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[2] Inst Infocomm Res, Data Analyt Dept, 1Fusionopolis Way, Singapore 138632, Singapore
来源
关键词
Active learning; Biomedical natural language processing; Information extraction;
D O I
10.1186/s13326-016-0059-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Methods: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. Results and conclusion: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Chinese named entity recognition combined active learning with self-training
    Zhong, Zhinong, 1600, National University of Defense Technology (36):
  • [42] A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition
    Li, Qingqing
    Huang, Zhen
    Dou, Yong
    Zhang, Ziwen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 88 - 100
  • [43] Cost-aware active learning for named entity recognition in clinical text
    Wei, Qiang
    Chen, Yukun
    Salimi, Mandana
    Denny, Joshua C.
    Mei, Qiaozhu
    Lasko, Thomas A.
    Chen, Qingxia
    Wu, Stephen
    Franklin, Amy
    Cohen, Trevor
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (11) : 1314 - 1322
  • [44] An active learning-enabled annotation system for clinical named entity recognition
    Chen, Yukun
    Lask, Thomas A.
    Mei, Qiaozhu
    Chen, Qingxia
    Moon, Sungrim
    Wang, Jingqi
    Ky Nguyen
    Dawodu, Tolulola
    Cohen, Trevor
    Denny, Joshua C.
    Xu, Hua
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [45] A Low-Cost Named Entity Recognition Research Based on Active Learning
    Huang, Han
    Wang, Hongyu
    Jin, Dawei
    SCIENTIFIC PROGRAMMING, 2018, 2018
  • [46] Uncertainty query sampling strategies for active learning of named entity recognition task
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2021, 15 (01): : 99 - 114
  • [47] Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
    Erdmann, Alexander
    Wrisley, David Joseph
    Allen, Benjamin
    Brown, Christopher
    Cohen-Bodenes, Sophie
    Elsner, Micha
    Feng, Yukun
    Joseph, Brian
    Joyeux-Prunel, Beatrice
    de Marneffe, Marie-Catherine
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2223 - 2234
  • [48] An active learning-enabled annotation system for clinical named entity recognition
    Yukun Chen
    Thomas A. Lask
    Qiaozhu Mei
    Qingxia Chen
    Sungrim Moon
    Jingqi Wang
    Ky Nguyen
    Tolulola Dawodu
    Trevor Cohen
    Joshua C. Denny
    Hua Xu
    BMC Medical Informatics and Decision Making, 17
  • [49] Uncertainty handling in named entity extraction and disambiguation for informal text
    van Keulen, Maurice
    Habib, Mena B.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8816 : 309 - 328
  • [50] Learning In-context Learning for Named Entity Recognition
    Chen, Jiawei
    Lu, Yaojie
    Lin, Hongyu
    Lou, Jie
    Jia, Wei
    Dai, Dai
    Wu, Hua
    Cao, Boxi
    Han, Xianpei
    Sun, Le
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13661 - 13675