Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition

被引:70
|
作者
Xu, Kai [1 ]
Yang, Zhenguo [1 ,2 ]
Kang, Peipei [1 ]
Wang, Qi [1 ]
Liu, Wenyin [1 ]
机构
[1] Guangdong Univ Technol, Dept Comp Sci, Guangzhou, Guangdong, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Biomedical informatics; Named entity recognition; String matching; Machine learning; Neural network; CONDITIONAL RANDOM-FIELDS; NORMALIZATION; EXTRACTION; COVERAGE; METAMAP; MODEL; TEXT;
D O I
10.1016/j.compbiomed.2019.04.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Disease named entity recognition (NER) plays an important role in biomedical research. There are a significant number of challenging issues to be addressed; among these, the identification of rare diseases and complex disease names and the problem of tagging inconsistency (i.e., if an entity is tagged differently in a document) are attracting substantial research attention. Methods: We propose a new neural network method named Dic-Att-BiLSTM-CRF (DABLC) for disease NER. DABLC applies an efficient exact string matching method to match disease entities with a disease dictionary; here, the dictionary is constructed based on the Disease Ontology. Furthermore, DABLC constructs a dictionary attention layer by incorporating a disease dictionary matching method and document-level attention mechanism. Finally, a bidirectional long short-term memory network and conditional random field (BiLSTM-CRF) with a dictionary attention layer is proposed to combine the disease dictionary to develop disease NER. Results: Extensive experiments are conducted on two widely-used corpora: the NCBI disease corpus and the BioCreative V CDR corpus. We apply each test on 10 executions of each model, with a 95% confidence interval. DABLC achieves the highest F1 scores (NCBI: Precision = 0.883, Recall = 0.89, F1 = 0.886; BioCreative V CDR: Precision = 0.891, Recall = 0.875, F1 = 0.883), outperforming the state-of-the-art methods. Conclusion: DABLC combines the advantages of both external dictionary resources and deep attention neural networks. This aids the identification of rare diseases and complex disease names; moreover, it reduces the impact of tagging inconsistency. Special disease NER and deep learning models addressing long sentences are noteworthy areas for future examination.
引用
收藏
页码:122 / 132
页数:11
相关论文
共 50 条
  • [21] Leveraging Document-Level Label Consistency for Named Entity Recognition
    Gui, Tao
    Ye, Jiacheng
    Zhang, Qi
    Zhou, Yaqian
    Gong, Yeyun
    Huang, Xuanjing
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3976 - 3982
  • [22] Research on Named Entity Recognition of Doctor-Patient Question Answering Community Based on BiLSTM-CRF Model
    Wang, Zhikang
    Guan, Hua
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1641 - 1644
  • [23] Domain Named Entity Recognition Combining GAN and BiLSTM-Attention-CRF
    Zhang H.
    Guo Y.
    Li T.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (09): : 1851 - 1858
  • [24] A named entity recognition method towards product reviews based on BiLSTM-attention-CRF
    Zhang, Shunxiang
    Zhu, Haiyang
    Xu, Hanqing
    Zhu, Guangli
    Li, Kuan Ching
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2022, 25 (05) : 479 - 489
  • [25] Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets
    Greenberg, Nathan
    Bansal, Trapit
    Verga, Patrick
    McCallum, Andrew
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2824 - 2829
  • [26] Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph
    Jin, Zhe
    Zhang, Yin
    Kuang, Haodan
    Yao, Liang
    Zhang, Wenjin
    Pan, Yunhe
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 537 - 548
  • [27] Exploiting global contextual information for document-level named entity recognition
    Yu, Yiting
    Wang, Zanbo
    Wei, Wei
    Zhang, Ruihan
    Mao, Xian-Ling
    Feng, Shanshan
    Wang, Fei
    He, Zhiyong
    Jiang, Sheng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [28] Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level
    Garcia, Marcos
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3357 - 3361
  • [29] Research on entity recognition and alignment of APT attack based on Bert and BiLSTM-CRF
    Yang, Xiuzhang
    Peng, Guojun
    Li, Zichuan
    Lyu, Yangqi
    Liu, Side
    Li, Chenguang
    [J]. Tongxin Xuebao/Journal on Communications, 2022, 43 (06): : 58 - 70
  • [30] Named entity recognition for Chinese judgment documents based on BiLSTM and CRF
    Huang, Wenming
    Hu, Dengrui
    Deng, Zhenrong
    Nie, Jianyun
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)