Multi-domain evaluation framework for named entity recognition tools

被引:9
|
作者
Abdallah, Zahraa S. [1 ]
Carman, Mark [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Monash Univ, Sch Informat Technol, Clayton, Vic, Australia
来源
关键词
Named entity recognition; Multi-domain evaluation; Qualitative data analysis; Benchmark evaluation;
D O I
10.1016/j.csl.2016.10.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:34 / 55
页数:22
相关论文
共 50 条
  • [31] Multi-Grained Named Entity Recognition
    Xia, Congying
    Zhang, Chenwei
    Yang, Tao
    Li, Yaliang
    Du, Nan
    Wu, Xian
    Fan, Wei
    Ma, Fenglong
    Yu, Philip
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1430 - 1440
  • [32] Creating a Dataset for Named Entity Recognition in the Archaeology Domain
    Brandsen, Alex
    Verberne, Suzan
    Wansleeben, Milco
    Lambers, Karsten
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4573 - 4577
  • [33] Domain Adaptation with Active Learning for Named Entity Recognition
    Sun, Huiyu
    Grishman, Ralph
    Wang, Yingchao
    CLOUD COMPUTING AND SECURITY, ICCCS 2016, PT II, 2016, 10040 : 611 - 622
  • [34] Domain Adaptation for Named Entity Recognition Using CRFs
    Tian, Tian
    Dinarelli, Marco
    Tellier, Isabelle
    Cardoso, Pedro Dias
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 561 - 565
  • [35] Named entity recognition in the legal domain for ontology population
    Bruckschen, Mirian
    Northfleet, Caio
    da Silva, Douglas
    Bridi, Paulo
    Granada, Roger
    Vieira, Renata
    Rao, Prasad
    Sander, Tomas
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : I16 - I21
  • [36] Towards reliable named entity recognition in the biomedical domain
    Giorgi, John M.
    Bader, Gary D.
    BIOINFORMATICS, 2020, 36 (01) : 280 - 286
  • [37] MMBERT: a unified framework for biomedical named entity recognition
    Lei Fu
    Zuquan Weng
    Jiheng Zhang
    Haihe Xie
    Yiqing Cao
    Medical & Biological Engineering & Computing, 2024, 62 : 327 - 341
  • [38] MMBERT: a unified framework for biomedical named entity recognition
    Fu, Lei
    Weng, Zuquan
    Zhang, Jiheng
    Xie, Haihe
    Cao, Yiqing
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) : 327 - 341
  • [39] Evaluation of Named Entity Recognition in Handwritten Documents
    Villanova-Aparisi, David
    Martinez-Hinarejos, Carlos-D
    Romero, Veronica
    Pastor-Gadea, Moises
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 568 - 582
  • [40] Evaluation of Named Entity Recognition in Spanish with OpenCalais
    Toribio, Raquel
    Martinez, Paloma
    de Pablo-Sanchez, Cesar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 287 - 290