Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引:25
|
作者
Love, Thorvardur Jon [1 ]
Cai, Tianxi [2 ]
Karlson, Elizabeth W. [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
关键词
psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;
D O I
10.1016/j.semarthrit.2010.05.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [41] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
    Sada, Yvonne
    Hou, Jason
    Richardson, Peter
    El-Serag, Hashem
    Davila, Jessica
    MEDICAL CARE, 2016, 54 (02) : E9 - E14
  • [42] Establishing a Validation Framework of Claims-Based Treatment Discontinuation Definitions using Natural Language Processing and Electronic Health Records
    Yang, Chun-Ting
    Ngan, Kerry
    Kim, Dae Hyun
    Liu, Jun
    Lin, Kueiyu Joshua
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2024, 33 : 70 - 71
  • [43] Validation of Autism Spectrum Disorder Diagnoses in Large Healthcare Systems with Electronic Medical Records
    Karen J. Coleman
    Marta A. Lutsky
    Vincent Yau
    Yinge Qian
    Magdalena E. Pomichowski
    Phillip M. Crawford
    Frances L. Lynch
    Jeanne M. Madden
    Ashli Owen-Smith
    John A. Pearson
    Kathryn A. Pearson
    Donna Rusinak
    Virginia P. Quinn
    Lisa A. Croen
    Journal of Autism and Developmental Disorders, 2015, 45 : 1989 - 1996
  • [44] On the Feasibility of Natural Language Processing for Standardized Data Extraction from Electronic Medical Records of Epilepsy Patients
    Khankhanian, Pouya
    Kosaraju, Nikitha
    Pathmanathan, Jay
    Ellis, Colin
    Helbig, Ingo
    Litt, Brian
    Pollard, John
    Davis, Kathryn
    NEUROLOGY, 2018, 90
  • [45] Annotation methods to develop and evaluate an expert system based on natural language processing in electronic medical records
    Gicquel, Quentin
    Tvardik, Nastassia
    Bouvry, Come
    Kergourlay, Ivan
    Bittar, Andre
    Segond, Frederique
    Darmoni, Stefan
    Metzger, Marie-Helene
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1067 - 1067
  • [46] Validation of Autism Spectrum Disorder Diagnoses in Large Healthcare Systems with Electronic Medical Records
    Coleman, Karen J.
    Lutsky, Marta A.
    Yau, Vincent
    Qian, Yinge
    Pomichowski, Magdalena E.
    Crawford, Phillip M.
    Lynch, Frances L.
    Madden, Jeanne M.
    Owen-Smith, Ashli
    Pearson, John A.
    Pearson, Kathryn A.
    Rusinak, Donna
    Quinn, Virginia P.
    Croen, Lisa A.
    JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2015, 45 (07) : 1989 - 1996
  • [47] CRITICAL REVIEW OF VALIDATION STUDIES OF NATURAL LANGUAGE PROCESSING TECHNIQUES APPLIED TO INFORMATION FROM ELECTRONIC MEDICAL RECORDS DURING THE LAST 5 YEARS
    Rebollo, P.
    Celik, H.
    Cerezales, M.
    Wilke, T.
    VALUE IN HEALTH, 2019, 22 : S729 - S729
  • [48] Predicting Inpatient Falls Using Natural Language Processing of Nursing Records Obtained From Japanese Electronic Medical Records: Case-Control Study
    Nakatani, Hayao
    Nakao, Masatoshi
    Uchiyama, Hidefumi
    Toyoshiba, Hiroyoshi
    Ochiai, Chikayuki
    JMIR MEDICAL INFORMATICS, 2020, 8 (04)
  • [49] Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records
    Zheng, Chengyi
    Lee, Ming-sum
    Bansal, Nisha
    Go, Alan S.
    Chen, Cheng
    Harrison, Teresa N.
    Fan, Dongjie
    Allen, Amanda
    Garcia, Elisha
    Lidgard, Ben
    Singer, Daniel
    An, Jaejin
    EUROPEAN HEART JOURNAL-QUALITY OF CARE AND CLINICAL OUTCOMES, 2024, 10 (01) : 77 - 88
  • [50] Using Natural Language Processing on Electronic Health Records to Enhance Detection and Prediction of Psychosis Risk
    Irving, Jessica
    Patel, Rashmi
    Oliver, Dominic
    Colling, Craig
    Pritchard, Megan
    Broadbent, Matthew
    Baldwin, Helen
    Stahl, Daniel
    Stewart, Robert
    Fusar-Poli, Paolo
    SCHIZOPHRENIA BULLETIN, 2021, 47 (02) : 405 - 414