Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts

被引:14
|
作者
Lee, Wangjin [1 ]
Kim, Kyungmo [1 ]
Lee, Eun Young [2 ]
Choi, Jinwook [1 ,3 ,4 ]
机构
[1] Seoul Natl Univ, Grad Sch, Interdisciplinary Program Bioengn, 103 Daehak Ro, Seoul 03080, South Korea
[2] Seoul Natl Univ, Coll Med, Dept Internal Med, Div Rheumatol, 103 Daehak Ro, Seoul 03080, South Korea
[3] Seoul Natl Univ, Coll Med, Dept Biomed Engn, 103 Daehak Ro, Seoul 03080, South Korea
[4] Seoul Natl Univ, Med Res Ctr, Inst Med & Biol Engn, 103 Daehak Ro, Seoul 03080, South Korea
基金
新加坡国家研究基金会;
关键词
Clinical named entity recognition; Conditional random field; String matching; Discharge summary; Medical history; INFORMATION EXTRACTION; MEDICATION INFORMATION; IDENTIFICATION;
D O I
10.1016/j.compbiomed.2018.07.019
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: This study demonstrates clinical named entity recognition (NER) methods on the clinical texts of rheumatism patients in South Korea. Despite the recent increase in the adoption rate of the electronic health record (EHR) system in global health institutions, health information technologies for handling and acquisition of information from numerous unstructured texts in the EHR system are still in their developing stages. The aim of this study is to verify the conventional named entity recognition (NER) methods, namely dictionary-lookupbased string matching and conditional random fields (CRFs). Methods: We selected discharge summaries for 200 rheumatic patients from the EHR system of the Seoul National University Hospital and attempted to identify heterogeneous semantic types present in the clinical notes of each patient's history. Results: CRFs outperform string matching in extracting most semantic types (median F-1= 0.761, minimum = 0.705, maximum = 0.906). String matching is found to be better suited for identifying hospital visit information. The performance of both methods is comparable for identifying medications. The 10-fold crossvalidation shows that CRFs had median F-1 = 0.811 (minimum = 0.752, maximum = 0.918), and exhibited good performance even when trained with simple features. Conclusion: CRFs are a good candidate for implementing clinical NER in Korean clinical narrative documents. Increasing the training data and incorporating sophisticated feature engineering might improve the accuracy of identifying health information, enabling automated patient history summarization in the future.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [1] Named Entity Recognition using Conditional Random Fields
    Patil, Nita
    Patil, Ajay
    Pawar, B., V
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1181 - 1188
  • [2] Named Entity Recognition Using Conditional Random Fields
    Khan, Wahab
    Daud, Ali
    Shahzad, Khurram
    Amjad, Tehmina
    Banjar, Ameen
    Fasihuddin, Heba
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [3] A tool for the named entity recognition using conditional random fields
    do Amaral, Daniela Oliveira F.
    Vieira, Renata
    [J]. LINGUAMATICA, 2014, 6 (01): : 41 - 49
  • [4] A Malay Named Entity Recognition Using Conditional Random Fields
    Salleh, Muhammad Sharilazlan
    Asmai, Siti Azirah
    Basiron, Halizah
    Ahmad, Sabrina
    [J]. 2017 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOIC7), 2017,
  • [5] Named entity recognition based on conditional random fields
    Song, Shengli
    Zhang, Nan
    Huang, Haitao
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S5195 - S5206
  • [6] Iterative Named Entity Recognition with Conditional Random Fields
    Alves-Pinto, Ana
    Demus, Christoph
    Spranger, Michael
    Labudde, Dirk
    Hobley, Eleanor
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [7] Named entity recognition based on conditional random fields
    Shengli Song
    Nan Zhang
    Haitao Huang
    [J]. Cluster Computing, 2019, 22 : 5195 - 5206
  • [8] Kannada Named Entity Recognition and classification using Conditional Random Fields
    Amarappa, S.
    Sathyanarayana, S. V.
    [J]. 2015 INTERNATIONAL CONFERENCE ON EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY (ICERECT), 2015, : 186 - 191
  • [9] Recognition of bacteria named entity using conditional random fields in Spark
    Wang, Xiaoyan
    Li, Yichuan
    He, Tingting
    Jiang, Xingpeng
    Hu, Xiaohua
    [J]. BMC SYSTEMS BIOLOGY, 2018, 12
  • [10] BIOMEDICAL NAMED ENTITY RECOGNITION USING SECONDORDER CONDITIONAL RANDOM FIELDS
    Thipcharoen, Supattanawaree
    Subpaiboonkit, Sitthichoke
    Chaijaruwanich, Jeerayut
    [J]. 2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 2, 2012, : 397 - 401