Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

被引:30
|
作者
Munkhdalai, Tsendsuren [1 ]
Li, Meijing [1 ]
Batsuren, Khuyagbaatar [1 ]
Park, Hyeon Ah [1 ]
Choi, Nak Hyeon [1 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju, South Korea
来源
基金
新加坡国家研究基金会;
关键词
Feature Representation Learning; Semi-Supervised Learning; Named Entity Recognition; Conditional Random Fields;
D O I
10.1186/1758-2946-7-S1-S9
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. Results: We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% Fmeasure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A Deep Learning-Based Named Entity Recognition in Biomedical Domain
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 517 - 526
  • [22] Biomedical named entity recognition system
    Patrick, J. (jonpat@it.usyd.edu.au), 2005, School of Information Technologies
  • [23] Incorporating rich background knowledge for gene named entity classification and recognition
    Yanpeng Li
    Hongfei Lin
    Zhihao Yang
    BMC Bioinformatics, 10
  • [24] Biomedical Named Entity Recognition via Knowledge Guidance and Question Answering
    Banerjee P.
    Pal K.K.
    Devarakonda M.
    Baral C.
    ACM Transactions on Computing for Healthcare, 2021, 2 (04):
  • [25] Incorporating rich background knowledge for gene named entity classification and recognition
    Li, Yanpeng
    Lin, Hongfei
    Yang, Zhihao
    BMC BIOINFORMATICS, 2009, 10
  • [26] Segment Representations in Named Entity Recognition
    Konkol, Michal
    Konopik, Miloslav
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 61 - 70
  • [27] LearningToAdapt with word embeddings: Domain adaptation of Named Entity Recognition systems
    Nozza, Debora
    Manchanda, Pikakshi
    Fersini, Elisabetta
    Palmonari, Matteo
    Messina, Enza
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [28] Named entity recognition in medical domain combined with knowledge graph
    Jin Z.
    He X.
    Yue S.
    Xiong Y.
    Luo J.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2023, 55 (05): : 50 - 58
  • [29] LearningToAdapt with word embeddings: Domain adaptation of Named Entity Recognition systems
    Nozza, Debora
    Manchanda, Pikakshi
    Fersini, Elisabetta
    Palmonari, Matteo
    Messina, Enza
    Information Processing and Management, 2021, 58 (03):
  • [30] Incorporating Social Context and Domain Knowledge for Entity Recognition
    Tang, Jie
    Fang, Zhanpeng
    Sun, Jimeng
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, : 517 - 526