Multi-domain evaluation framework for named entity recognition tools

被引:9
|
作者
Abdallah, Zahraa S. [1 ]
Carman, Mark [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Monash Univ, Sch Informat Technol, Clayton, Vic, Australia
来源
关键词
Named entity recognition; Multi-domain evaluation; Qualitative data analysis; Benchmark evaluation;
D O I
10.1016/j.csl.2016.10.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:34 / 55
页数:22
相关论文
共 50 条
  • [1] Towards a Unified Multi-Domain Multilingual Named Entity Recognition Model
    Kulkarni, Mayank
    Preotiuc-Pietro, Daniel
    Radhakrishnan, Karthik
    Winata, Genta Indra
    Wu, Shijie
    Xie, Lingjue
    Yang, Shaohua
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2210 - 2219
  • [2] KIND: an Italian Multi-Domain Dataset for Named-Entity Recognition
    Paccosi, Teresa
    Aprosio, Alessio Palmero
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 501 - 507
  • [3] TeluguNER: Leveraging Multi-Domain Named Entity Recognition with Deep Transformers
    Duggenpudi, Suma Reddy
    Oota, Subba Reddy
    Marreddy, Mounika
    Mamidi, Radhika
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 262 - 272
  • [4] Multi-domain adaptation for named entity recognition with multi-aspect relevance learning
    Li, Jiarui
    Liu, Jian
    Chen, Yufeng
    Xu, Jinan
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 803 - 818
  • [5] Multi-domain adaptation for named entity recognition with multi-aspect relevance learning
    Jiarui Li
    Jian Liu
    Yufeng Chen
    Jinan Xu
    Language Resources and Evaluation, 2023, 57 : 803 - 818
  • [6] A framework for Named Entity Recognition in the Open domain
    Evans, RJ
    RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 267 - 276
  • [7] An Empirical Study of Multi-domain and Multi-task Learning in Chinese Named Entity Recognition
    Hu, Yun
    Liao, Mingxue
    Lv, Pin
    Zheng, Changwen
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: DEEP LEARNING, PT II, 2019, 11728 : 743 - 754
  • [8] TMD-NER: Turkish multi-domain named entity recognition for informal texts
    Yilmaz, Selim F.
    Mutlu, Furkan B.
    Balaban, Ismail
    Kozat, Suleyman S.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2255 - 2263
  • [9] TMD-NER: Turkish multi-domain named entity recognition for informal texts
    Selim F. Yilmaz
    Furkan B. Mutlu
    Ismail Balaban
    Suleyman S. Kozat
    Signal, Image and Video Processing, 2024, 18 : 2255 - 2263
  • [10] A Double Adversarial Network Model for Multi-Domain and Multi-Task Chinese Named Entity Recognition
    Hu, Yun
    Zheng, Changwen
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07) : 1744 - 1752