HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition

被引:45
|
作者
Weber, Leon [1 ,2 ]
Saenger, Mario [1 ]
Munchmeyer, Jannes [1 ,3 ]
Habibi, Maryam [1 ]
Leser, Ulf [1 ]
Akbik, Alan [1 ]
机构
[1] Humboldt Univ, Comp Sci Dept, D-10099 Berlin, Germany
[2] Helmholtz Assoc, Grp Math Modelling Cellular Proc, Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany
[3] GFZ German Res Ctr Geosci, Sect Seismol, D-14473 Potsdam, Germany
关键词
D O I
10.1093/bioinformatics/btab042
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Named entity recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, be highly accurate and be robust toward variations in text genre and style. We present HunFlair, a NER tagger fulfilling these requirements. HunFlair is integrated into the widely used NLP framework Flair, recognizes five biomedical entity types, reaches or overcomes state-of-the-art performance on a wide set of evaluation corpora, and is trained in a cross-corpus setting to avoid corpus-specific bias. Technically, it uses a character-level language model pretrained on roughly 24 million biomedical abstracts and three million full texts. It outperforms other off-the-shelf biomedical NER tools with an average gain of 7.26 pp over the next best tool in a cross-corpus setting and achieves on-par results with state-of-the-art research prototypes in in-corpus experiments. HunFlair can be installed with a single command and is applied with only four lines of code. Furthermore, it is accompanied by harmonized versions of 23 biomedical NER corpora.
引用
下载
收藏
页码:2792 / 2794
页数:3
相关论文
共 50 条
  • [41] MMBERT: a unified framework for biomedical named entity recognition
    Fu, Lei
    Weng, Zuquan
    Zhang, Jiheng
    Xie, Haihe
    Cao, Yiqing
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) : 327 - 341
  • [42] Accurate Clinical and Biomedical Named Entity Recognition at Scale
    Kocaman, Veysel
    Talby, David
    SOFTWARE IMPACTS, 2022, 13
  • [43] Classifier subset selection for biomedical named entity recognition
    Nazife Dimililer
    Ekrem Varoğlu
    Hakan Altınçay
    Applied Intelligence, 2009, 31 : 267 - 282
  • [44] Comparison of named entity recognition methodologies in biomedical documents
    Hye-Jeong Song
    Byeong-Cheol Jo
    Chan-Young Park
    Jong-Dae Kim
    Yu-Seop Kim
    BioMedical Engineering OnLine, 17
  • [45] Towards reliable named entity recognition in the biomedical domain
    Giorgi, John M.
    Bader, Gary D.
    BIOINFORMATICS, 2020, 36 (01) : 280 - 286
  • [46] Improving biomedical named entity recognition with syntactic information
    Tian, Yuanhe
    Shen, Wang
    Song, Yan
    Xia, Fei
    He, Min
    Li, Kenli
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [47] Recognition of genetic predisposition in pediatric cancer patients: An easy-to-use selection tool
    Jongmans, Marjolijn C. J.
    Loeffen, Jan L. C. M.
    Waanders, Esme
    Hoogerbrugge, Peter M.
    Ligtenberg, Marjolijn J. L.
    Kuiper, Roland P.
    Hoogerbrugge, Nicoline
    EUROPEAN JOURNAL OF MEDICAL GENETICS, 2016, 59 (03) : 116 - 125
  • [48] On the Use of Parsing for Named Entity Recognition
    Alonso, Miguel A.
    Gomez-Rodriguez, Carlos
    Vilares, Jesus
    APPLIED SCIENCES-BASEL, 2021, 11 (03): : 1 - 24
  • [49] A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining
    Kim, Donghyeon
    Lee, Jinhyuk
    So, Chan Ho
    Jeon, Hwisang
    Jeong, Minbyul
    Choi, Yonghwa
    Yoon, Wonjin
    Sung, Mujeen
    Kang, Jaewoo
    IEEE ACCESS, 2019, 7 : 73729 - 73740
  • [50] An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition
    Yan, Hang
    Sun, Yu
    Li, Xiaonan
    Qiu, Xipeng
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1442 - 1452