Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引:25
|
作者
Love, Thorvardur Jon [1 ]
Cai, Tianxi [2 ]
Karlson, Elizabeth W. [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
关键词
psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;
D O I
10.1016/j.semarthrit.2010.05.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [21] Deriving comorbidities from medical records using natural language processing
    Salmasian, Hojjat
    Freedberg, Daniel E.
    Friedman, Carol
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (E2) : E239 - E242
  • [22] Establishing a Validation Framework of Treatment Discontinuation in Claims Data Using Natural Language Processing and Electronic Health Records
    Yang, Chun-Ting
    Ngan, Kerry
    Kim, Dae Hyun
    Yang, Jie
    Liu, Jun
    Lin, Kueiyu Joshua
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2025,
  • [23] CliniViewer: A tool for viewing electronic medical records based on natural language processing and XML
    Liu, HF
    Friedman, C
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 639 - 643
  • [24] Improving Adherence to Clinical Pathways Through Natural Language Processing on Electronic Medical Records
    Cruz, Noa P.
    Canales, Lea
    Garcia Munoz, Javier
    Perez, Bernardino
    Arnott, Ignacio
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 561 - 565
  • [25] CliniViewer: A tool for viewing electronic medical records based on natural language processing and XML
    Liu, Hongfang
    Friedman, Carol
    Studies in Health Technology and Informatics, 2004, 107 : 639 - 643
  • [26] A Natural Language Processing Alogrithm for Identification of Patients With Cirrhosis From Electronic Medical Records
    Kung, Robert
    Ma, Ariel
    Dever, John B.
    Vadivelu, Jaya
    Cherk, Erika
    Koola, Jejo D.
    Groessl, Erik J.
    Matheny, Michael E.
    Ho, Samuel B.
    GASTROENTEROLOGY, 2015, 148 (04) : S1071 - S1072
  • [27] RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research
    Schiappa, Renaud
    Contu, Sara
    Culie, Dorian
    Thamphya, Brice
    Chateau, Yann
    Gal, Jocelyn
    Bailleux, Caroline
    Haudebourg, Juliette
    Ferrero, Jean-Marc
    Barranger, Emmanuel
    Chamorey, Emmanuel
    JCO CLINICAL CANCER INFORMATICS, 2022, 6 : e2100199
  • [28] Information processing in electronic medical records: A survey validation
    Williams, Cynthia
    Hamadi, Hanadi
    Cummings, Cynthia
    Zakari, Nazik M. A.
    JOURNAL OF EVALUATION IN CLINICAL PRACTICE, 2019, 25 (01) : 97 - 103
  • [29] Natural language processing of Veterans' electronic health records to confirm diagnoses of monoclonal gammopathy of undetermined significance.
    Wang, Mei
    Yu, Yao-Chi
    Liu, Lawrence
    Schoen, Martin W.
    Thomas, Theodore Seth
    Colditz, Graham A.
    Chang, Su-Hsin
    JOURNAL OF CLINICAL ONCOLOGY, 2022, 40 (16)
  • [30] Natural Language Processing (NLP): Identifying Linguistic Gender Bias in Electronic Medical Records (EMRs)
    Xu, Site
    Sun, Mu
    JOURNAL OF PATIENT EXPERIENCE, 2025, 12