Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition

被引:70
|
作者
Xu, Kai [1 ]
Yang, Zhenguo [1 ,2 ]
Kang, Peipei [1 ]
Wang, Qi [1 ]
Liu, Wenyin [1 ]
机构
[1] Guangdong Univ Technol, Dept Comp Sci, Guangzhou, Guangdong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Biomedical informatics; Named entity recognition; String matching; Machine learning; Neural network; CONDITIONAL RANDOM-FIELDS; NORMALIZATION; EXTRACTION; COVERAGE; METAMAP; MODEL; TEXT;
D O I
10.1016/j.compbiomed.2019.04.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention. Methods: We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER. Results: Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods. Conclusion: DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.
引用
收藏
页码:122 / 132
页数:11
相关论文
共 50 条
  • [31] Chinese Named Entity Recognition Based on CNN-BiLSTM-CRF
    Jia, Yaozong
    Xu, Xiaobin
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 831 - 834
  • [32] Named entity recognition for Chinese judgment documents based on BiLSTM and CRF
    Wenming Huang
    Dengrui Hu
    Zhenrong Deng
    Jianyun Nie
    [J]. EURASIP Journal on Image and Video Processing, 2020
  • [33] Domain-specific Named Entity Recognition with Document-Level Optimization
    Wang, Limin
    Li, Shoushan
    Yan, Qian
    Zhou, Guodong
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (04)
  • [34] Consistency enhancement of model prediction on document-level named entity recognition
    Jeong, Minbyul
    Kang, Jaewoo
    [J]. BIOINFORMATICS, 2023, 39 (06)
  • [35] Named Entity Recognition of BERT-BiLSTM-CRF Combined with Self-attention
    Xu, Lei
    Li, Shuang
    Wang, Yuchen
    Xu, Lizhen
    [J]. WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 : 556 - 564
  • [36] Attention-based Multi-level Feature Fusion for Named Entity Recognition
    Yang, Zhiwei
    Chen, Hechang
    Zhang, Jiawei
    Ma, Jing
    Chang, Yi
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3594 - 3600
  • [37] Medical entity recognition and knowledge map relationship of Chinese EMRs based on BiLSTM-CRF
    Ke, Jia
    Wang, Weiji
    Chen, Xiaojun
    Gou, Jianping
    Gao, Yan
    Jin, Shuai
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [38] NAMED ENTITY RECOGNITION IN THANGKA FIELD BASED ON BERT-BiLSTM-CRF-a
    Guo, Xiaoran
    Cheng, Sujie
    Wang, Weilan
    [J]. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2021, 83 (01): : 161 - 174
  • [39] Leveraging Multi-Token Entities in Document-Level Named Entity Recognition
    Hu, Anwen
    Dou, Zhicheng
    Nie, Jian-Yun
    Wen, Ji-Rong
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7961 - 7968
  • [40] Named Entity Recognition for Chinese Aviation Security Incident Based on BiLSTM and CRF
    Zhao, Yan
    Liu, Hu
    Chen, Zhen
    [J]. 2021 2ND ASIA CONFERENCE ON COMPUTERS AND COMMUNICATIONS (ACCC 2021), 2021, : 89 - 94