HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition

被引:45
|
作者
Weber, Leon [1 ,2 ]
Saenger, Mario [1 ]
Munchmeyer, Jannes [1 ,3 ]
Habibi, Maryam [1 ]
Leser, Ulf [1 ]
Akbik, Alan [1 ]
机构
[1] Humboldt Univ, Comp Sci Dept, D-10099 Berlin, Germany
[2] Helmholtz Assoc, Grp Math Modelling Cellular Proc, Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany
[3] GFZ German Res Ctr Geosci, Sect Seismol, D-14473 Potsdam, Germany
关键词
D O I
10.1093/bioinformatics/btab042
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Named entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate and be robust toward variations in text genre and style. We present HunFlair, a NER tagger fulfilling these requirements. HunFlair is integrated into the widely used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias. Technically, it uses a character-level language model pretrained on roughly 24 million biomedical abstracts and three million full texts. It outperforms other off-the-shelf biomedical NER tools with an average gain of 7.26 pp over the next best tool in a cross-corpus setting and achieves on-par results with state-of-the-art research prototypes in in-corpus experiments. HunFlair can be installed with a single command and is applied with only four lines of code. Furthermore, it is accompanied by harmonized versions of 23 biomedical NER corpora.
引用
下载
收藏
页码:2792 / 2794
页数:3
相关论文
共 50 条
  • [31] Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts
    Cabot, Chloe
    Darmoni, Stefan
    Soualmia, Lina F.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 94
  • [32] MMBERT: a unified framework for biomedical named entity recognition
    Lei Fu
    Zuquan Weng
    Jiheng Zhang
    Haihe Xie
    Yiqing Cao
    Medical & Biological Engineering & Computing, 2024, 62 : 327 - 341
  • [33] Study of Named Entity Recognition methods in biomedical field
    Sniegula, Anna
    Poniszewska-Maranda, Aneta
    Chomatek, Lukasz
    10TH INT CONF ON EMERGING UBIQUITOUS SYST AND PERVAS NETWORKS (EUSPN-2019) / THE 9TH INT CONF ON CURRENT AND FUTURE TRENDS OF INFORMAT AND COMMUN TECHNOLOGIES IN HEALTHCARE (ICTH-2019) / AFFILIATED WORKOPS, 2019, 160 : 260 - 265
  • [34] Comparison of named entity recognition methodologies in biomedical documents
    Song, Hye-Jeong
    Jo, Byeong-Cheol
    Park, Chan-Young
    Kim, Jong-Dae
    Kim, Yu-Seop
    BIOMEDICAL ENGINEERING ONLINE, 2018, 17
  • [35] Towards the Named Entity Recognition Methods in Biomedical Field
    Sniegula, Anna
    Poniszewska-Maranda, Aneta
    Chomatek, Lukasz
    SOFSEM 2020: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2020, 12011 : 375 - 387
  • [36] Various criteria in the evaluation of biomedical named entity recognition
    Tsai, RTH
    Wu, SH
    Chou, WC
    Lin, YC
    He, D
    Hsiang, J
    Sung, TY
    Hsu, WL
    BMC BIOINFORMATICS, 2006, 7 (1) : 1 - 8
  • [37] Multiobjective Optimization for Biomedical Named Entity Recognition and Classification
    Ekbal, Asif
    Saha, Sriparna
    Sikdar, Utpal Kumar
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 206 - 213
  • [38] Various criteria in the evaluation of biomedical named entity recognition
    Richard Tzong-Han Tsai
    Shih-Hung Wu
    Wen-Chi Chou
    Yu-Chun Lin
    Ding He
    Jieh Hsiang
    Ting-Yi Sung
    Wen-Lian Hsu
    BMC Bioinformatics, 7
  • [39] Improving biomedical named entity recognition with syntactic information
    Yuanhe Tian
    Wang Shen
    Yan Song
    Fei Xia
    Min He
    Kenli Li
    BMC Bioinformatics, 21
  • [40] Classifier subset selection for biomedical named entity recognition
    Dimililer, Nazife
    Varoglu, Ekrem
    Altincay, Hakan
    APPLIED INTELLIGENCE, 2009, 31 (03) : 267 - 282