DTranNER: biomedical named entity recognition with deep learning-based label-label transition model

被引:22
|
作者
Hong, S. K. [1 ]
Lee, Jae-Gil [1 ,2 ]
机构
[1] Korea Adv Inst Sci & Technol, Grad Sch Knowledge Serv Engn, 291 Daehak Ro, Daejeon 34141, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, 291 Daehak Ro, Daejeon 34141, South Korea
基金
新加坡国家研究基金会;
关键词
Bioinformatics; Data mining; Named entity recognition; Neural network; NETWORKS;
D O I
10.1186/s12859-020-3393-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Biomedical named-entity recognition (BioNER) is widely modeled with conditional random fields (CRF) by regarding it as a sequence labeling problem. The CRF-based methods yield structured outputs of labels by imposing connectivity between the labels. Recent studies for BioNER have reported state-of-the-art performance by combining deep learning-based models (e.g., bidirectional Long Short-Term Memory) and CRF. The deep learning-based models in the CRF-based methods are dedicated to estimating individual labels, whereas the relationships between connected labels are described as static numbers; thereby, it is not allowed to timely reflect the context in generating the most plausible label-label transitions for a given input sentence. Regardless, correctly segmenting entity mentions in biomedical texts is challenging because the biomedical terms are often descriptive and long compared with general terms. Therefore, limiting the label-label transitions as static numbers is a bottleneck in the performance improvement of BioNER. Results We introduce DTranNER, a novel CRF-based framework incorporating a deep learning-based label-label transition model into BioNER. DTranNER uses two separate deep learning-based networks: Unary-Network and Pairwise-Network. The former is to model the input for determining individual labels, and the latter is to explore the context of the input for describing the label-label transitions. We performed experiments on five benchmark BioNER corpora. Compared with current state-of-the-art methods, DTranNER achieves the best F1-score of 84.56% beyond 84.40% on the BioCreative II gene mention (BC2GM) corpus, the best F1-score of 91.99% beyond 91.41% on the BioCreative IV chemical and drug (BC4CHEMD) corpus, the best F1-score of 94.16% beyond 93.44% on the chemical NER, the best F1-score of 87.22% beyond 86.56% on the disease NER of the BioCreative V chemical disease relation (BC5CDR) corpus, and a near-best F1-score of 88.62% on the NCBI-Disease corpus. Conclusions Our results indicate that the incorporation of the deep learning-based label-label transition model provides distinctive contextual clues to enhance BioNER over the static transition model. We demonstrate that the proposed framework enables the dynamic transition model to adaptively explore the contextual relations between adjacent labels in a fine-grained way. We expect that our study can be a stepping stone for further prosperity of biomedical literature mining.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Integrated Deep Learning with Attention Layer Based Approach for Precise Biomedical Named Entity Recognition
    Pooja, H.
    Jagadeesh, Prabhudev M. P.
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (06) : 704 - 713
  • [22] Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison
    Song, Bosheng
    Li, Fen
    Liu, Yuansheng
    Zeng, Xiangxiang
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [23] Improving deep learning method for biomedical named entity recognition by using entity definition information
    Ying Xiong
    Shuai Chen
    Buzhou Tang
    Qingcai Chen
    Xiaolong Wang
    Jun Yan
    Yi Zhou
    BMC Bioinformatics, 22
  • [24] Improving deep learning method for biomedical named entity recognition by using entity definition information
    Xiong, Ying
    Chen, Shuai
    Tang, Buzhou
    Chen, Qingcai
    Wang, Xiaolong
    Yan, Jun
    Zhou, Yi
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 1)
  • [25] CLGLF: Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition Method
    Wang, Hai-Rong
    Wang, Tong
    Xu, Xi
    Jing, Bo-Xiang
    Chen, Fang-Ping
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2429 - 2437
  • [26] Semi-Supervised Noisy Label Learning for Chinese Clinical Named Entity Recognition
    Li, Zhucong
    Gan, Zhen
    Zhang, Baoli
    Chen, Yubo
    Wan, Jing
    Liu, Kang
    Zhao, Jun
    Liu, Shengping
    DATA INTELLIGENCE, 2021, 3 (03) : 389 - 401
  • [27] Named Entity Recognition in Vietnamese Text Using Label Propagation
    Huong Thanh Le
    Rathany Chan Sam
    Hoan Cong Nguyen
    Thuy Thanh Nguyen
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 366 - 370
  • [28] Recursive label attention network for nested named entity recognition
    Kim, Hongjin
    Kim, Harksoo
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [29] A CRF based Machine Learning Approach for Biomedical Named Entity Recognition
    Kanimozhi, U.
    Manjula, D.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 335 - 342
  • [30] Biomedical Named Entity Recognition Based on MCBERT
    Wang, Sai
    Yilahun, Hankiz
    Hamdulla, Askar
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 247 - 252