Active learning for ontological event extraction incorporating named entity recognition and unknown word handling

被引:4
|
作者
Han, Xu [1 ]
Kim, Jung-jae [2 ]
Kwoh, Chee Keong [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[2] Inst Infocomm Res, Data Analyt Dept, 1Fusionopolis Way, Singapore 138632, Singapore
来源
关键词
Active learning; Biomedical natural language processing; Information extraction;
D O I
10.1186/s13326-016-0059-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Methods: Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. Results and conclusion: We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Named entity recognition in greek texts with an ensemble of SVMS and active learning
    Lucarelli, Giorgio
    Vasilakos, Xenofon
    Androutsopoulos, Ion
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (06) : 1015 - 1045
  • [32] Continual Learning for Named Entity Recognition
    Monaikul, Natawut
    Castellucci, Giuseppe
    Filice, Simone
    Rokhlenko, Oleg
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13570 - 13577
  • [33] Ensemble Learning for Named Entity Recognition
    Speck, Rene
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB - ISWC 2014, PT I, 2014, 8796 : 519 - 534
  • [34] Incorporating Named Entity Recognition into the Speech Transcription Process
    Hatmi, Mohamed
    Jacquin, Christine
    Morin, Emmanuel
    Meignier, Sylvain
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3699 - 3703
  • [35] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
  • [36] Chinese Data Extraction and Named Entity Recognition
    Yang, Tingwei
    Jiang, Daguang
    Shi, Shenghui
    Than, Siyan
    Zhuo, Lin
    Yin, Yukang
    Liang, Zheng
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 105 - 109
  • [37] Combining Word Embeddings for Portuguese Named Entity Recognition
    da Silva, Messias Gomes
    Alves de Oliveira, Hilario Tomaz
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 198 - 208
  • [38] Named Entity Recognition Only from Word Embeddings
    Luo, Ying
    Zhao, Hai
    Zhan, Junlang
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8995 - 9005
  • [39] Unified Named Entity Recognition as Word-Word Relation Classification
    Li, Jingye
    Fei, Hao
    Liu, Jiang
    Wu, Shengqiong
    Zhang, Meishan
    Teng, Chong
    Ji, Donghong
    Li, Fei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10965 - 10973
  • [40] Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning
    Peng, Nanyun
    Dredze, Mark
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 149 - 155