Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

被引:30
|
作者
Munkhdalai, Tsendsuren [1 ]
Li, Meijing [1 ]
Batsuren, Khuyagbaatar [1 ]
Park, Hyeon Ah [1 ]
Choi, Nak Hyeon [1 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju, South Korea
来源
基金
新加坡国家研究基金会;
关键词
Feature Representation Learning; Semi-Supervised Learning; Named Entity Recognition; Conditional Random Fields;
D O I
10.1186/1758-2946-7-S1-S9
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. Results: We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% Fmeasure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations
    Tsendsuren Munkhdalai
    Meijing Li
    Khuyagbaatar Batsuren
    Hyeon Ah Park
    Nak Hyeon Choi
    Keun Ho Ryu
    Journal of Cheminformatics, 7
  • [2] Knowledge-Graph Augmented Word Representations for Named Entity Recognition
    He, Qizhen
    Wu, Liang
    Yin, Yida
    Cai, Heming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7919 - 7926
  • [3] Named Entity Recognition System for the Biomedical Domain
    Sharma, Raghav
    Chauhan, Deependra
    Sharma, Raksha
    PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, : 837 - 840
  • [4] A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations
    Wei, Hao
    Gao, Mingyuan
    Zhou, Ai
    Chen, Fei
    Qu, Wen
    Zhang, Yijia
    Lu, Mingyu
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [5] Towards reliable named entity recognition in the biomedical domain
    Giorgi, John M.
    Bader, Gary D.
    BIOINFORMATICS, 2020, 36 (01) : 280 - 286
  • [6] Medical Named Entity Recognition with Domain Knowledge
    Pei W.
    Sun S.
    Li X.
    Lu J.
    Yang L.
    Wu Y.
    Data Analysis and Knowledge Discovery, 2023, 7 (03) : 142 - 154
  • [7] Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning
    Zhang, Yaoyun
    Xu, Jun
    Chen, Hui
    Wang, Jingqi
    Wu, Yonghui
    Prakasam, Manu
    Xu, Hua
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [8] Computational Reproducibility of Named Entity Recognition methods in the biomedical domain
    Garcia-Serrano, Ana
    Hennig, Sebastian
    Nuernberger, Andreas
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2021, (66): : 141 - 152
  • [9] Faster biomedical named entity recognition based on knowledge distillation
    Hu B.
    Geng T.
    Deng G.
    Duan L.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2021, 61 (09): : 936 - 942
  • [10] On the Use of Knowledge Transfer Techniques for Biomedical Named Entity Recognition
    Mehmood, Tahir
    Serina, Ivan
    Lavelli, Alberto
    Putelli, Luca
    Gerevini, Alfonso
    FUTURE INTERNET, 2023, 15 (02):