Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition

被引:70
|
作者
Xu, Kai [1 ]
Yang, Zhenguo [1 ,2 ]
Kang, Peipei [1 ]
Wang, Qi [1 ]
Liu, Wenyin [1 ]
机构
[1] Guangdong Univ Technol, Dept Comp Sci, Guangzhou, Guangdong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Biomedical informatics; Named entity recognition; String matching; Machine learning; Neural network; CONDITIONAL RANDOM-FIELDS; NORMALIZATION; EXTRACTION; COVERAGE; METAMAP; MODEL; TEXT;
D O I
10.1016/j.compbiomed.2019.04.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention. Methods: We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER. Results: Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods. Conclusion: DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.
引用
收藏
页码:122 / 132
页数:11
相关论文
共 50 条
  • [1] An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition
    Luo, Ling
    Yang, Zhihao
    Yang, Pei
    Zhang, Yin
    Wang, Lei
    Lin, Hongfei
    Wang, Jian
    [J]. BIOINFORMATICS, 2018, 34 (08) : 1381 - 1388
  • [2] An Attention-Based BiLSTM-CRF Model for Chinese Clinic Named Entity Recognition
    Wu, Guohua
    Tang, Guangen
    Wang, Zhongru
    Zhang, Zhen
    Wang, Zhen
    [J]. IEEE ACCESS, 2019, 7 : 113942 - 113949
  • [3] Named Entity Recognition From Biomedical Texts Using a Fusion Attention-Based BiLSTM-CRF
    Wei, Hao
    Gao, Mingyuan
    Zhou, Ai
    Chen, Fei
    Qu, Wen
    Wang, Chunli
    Lu, Mingyu
    [J]. IEEE ACCESS, 2019, 7 : 73627 - 73636
  • [4] Named entity recognition of agricultural based entity-level masking BERT and BiLSTM-CRF
    Wei, Zijun
    Song, Ling
    Hu, Xiaochun
    Chen, Ningjiang
    [J]. Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2022, 38 (15): : 195 - 203
  • [5] BiLSTM-CRF for Persian Named-Entity Recognition
    Poostchi, Hanieh
    Borzeshi, Ehsan Zare
    Piccardi, Massimo
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4427 - 4431
  • [6] Arabic named entity recognition in social media based on BiLSTM-CRF using an attention mechanism
    Benali, B. Ait
    Mihi, S.
    Mlouk, A. Ait
    El Bazi, I
    Laachfoubi, N.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (06) : 5427 - 5436
  • [7] Named Entity Recognition of Traditional Chinese Medicine Patents Based on BiLSTM-CRF
    Deng, Na
    Fu, Hao
    Chen, Xu
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [8] BiLSTM-CRF for geological named entity recognition from the geoscience literature
    Qinjun Qiu
    Zhong Xie
    Liang Wu
    Liufeng Tao
    Wenjia Li
    [J]. Earth Science Informatics, 2019, 12 : 565 - 579
  • [9] Drug Specification Named Entity Recognition base on BiLSTM-CRF Model
    Li, Wei-Yan
    Song, Wen-Ai
    Jia, Xin-Hong
    Yang, Ji-Jiang
    Wang, Qing
    Lei, Yi
    Huang, Ke
    Li, Jun
    Yang, Ting
    [J]. 2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 2, 2019, : 429 - 433
  • [10] BiLSTM-CRF for geological named entity recognition from the geoscience literature
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    Li, Wenjia
    [J]. EARTH SCIENCE INFORMATICS, 2019, 12 (04) : 565 - 579