Accelerating the annotation of sparse named entities by dynamic sentence selection

被引:7
|
作者
Tsuruoka, Yoshimasa [1 ,2 ]
Tsujii, Jun'ichi [1 ,2 ,3 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, MIB, Manchester M1 7DN, Lancs, England
[2] MIB, Natl Ctr Text Min NaCTeM, Manchester M1 7DN, Lancs, England
[3] Univ Tokyo, Dept Comp Sci, Bunkyo Ku, Tokyo, Japan
基金
英国生物技术与生命科学研究理事会;
关键词
Conditional Random Field; Annotation Process; Target Category; Entity Recognition; Annotate Corpus;
D O I
10.1186/1471-2105-9-S11-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. Results: This paper presents an active learning-like framework for reducing the human effort required to create named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. Unlike active learning, our framework aims to annotate all occurrences of the target named entities in the given corpus, so that the resulting annotations are free from the sampling bias which is inevitable in active learning approaches. Conclusion: We evaluate our framework by simulating the annotation process using two named entity corpora and show that our approach can reduce the number of sentences which need to be examined by the human annotator. The cost reduction achieved by the framework could be drastic when the target named entities are sparse.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Accelerating the annotation of sparse named entities by dynamic sentence selection
    Yoshimasa Tsuruoka
    Jun'ichi Tsujii
    Sophia Ananiadou
    [J]. BMC Bioinformatics, 9
  • [2] Towards a double annotation of Named Entities
    Ehrmann, Maud
    Jacquet, Guillaume
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2006, 47 (03): : 63 - 88
  • [3] Temporal Role Annotation for Named Entities
    Koutraki, Maria
    Bakhshandegan-Moghaddam, Farshad
    Sack, Harald
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC SYSTEMS, 2018, 137 : 223 - 234
  • [4] Search rules of annotation for the recognition of named entities
    Nouvel, Damien
    Antoine, Jean -Yves
    Friburger, Nathalie
    Soulet, Arnaud
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2013, 54 (02): : 13 - 41
  • [5] Automatic Semantic Web Annotation of Named Entities
    Charton, Eric
    Gagnon, Michel
    Ozell, Benoit
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 74 - 85
  • [6] Towards the Annotation of Named Entities in the National Corpus of Polish
    Savary, Agata
    Waszczuk, Jakub
    Przepiorkowski, Adam
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [7] Job Vacancy Ranking with Sentence Embeddings, Keywords, and Named Entities
    Vanetik, Natalia
    Kogan, Genady
    [J]. INFORMATION, 2023, 14 (08)
  • [8] SIA: a scalable interoperable annotation server for biomedical named entities
    Kirschnick, Johannes
    Thomas, Philippe
    Roller, Roland
    Hennig, Leonhard
    [J]. JOURNAL OF CHEMINFORMATICS, 2018, 10
  • [9] Annotation tools for syntax and named entities in the National Corpus of Polish
    Waszczuk, Jakub
    Glowinska, Katarzyna
    Savary, Agata
    Przepiorkowski, Adam
    Lenart, Michal
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2013, 5 (02) : 103 - 122
  • [10] SIA: a scalable interoperable annotation server for biomedical named entities
    Johannes Kirschnick
    Philippe Thomas
    Roland Roller
    Leonhard Hennig
    [J]. Journal of Cheminformatics, 10