Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引:25
|
作者
Love, Thorvardur Jon [1 ]
Cai, Tianxi [2 ]
Karlson, Elizabeth W. [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
关键词
psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;
D O I
10.1016/j.semarthrit.2010.05.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [31] Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records
    Savova, Guergana K.
    Danciu, Ioana
    Alamudun, Folami
    Miller, Timothy
    Lin, Chen
    Bitterman, Danielle S.
    Tourassi, Georgia
    Warner, Jeremy L.
    CANCER RESEARCH, 2019, 79 (21) : 5463 - 5470
  • [32] A Case Study of the Incremental Utility for Disease Identification of Natural Language Processing in Electronic Medical Records
    Weiss L.S.
    Zhou X.
    Walker A.M.
    Ananthakrishnan A.N.
    Shen R.
    Sobel R.E.
    Bate A.
    Reynolds R.F.
    Pharmaceutical Medicine, 2018, 32 (1) : 31 - 37
  • [33] Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
    Luo, Yuan
    Szolovits, Peter
    BIOMEDICAL INFORMATICS INSIGHTS, 2016, 8
  • [34] Natural language processing of electronic medical records identifies cardioprotective agents for anthracycline induced cardiotoxicity
    Kawazoe, Yoshimasa
    Tsuchiya, Masami
    Shimamoto, Kiminori
    Seki, Tomohisa
    Shinohara, Emiko
    Yada, Shuntaro
    Wakamiya, Shoko
    Imai, Shungo
    Aramaki, Eiji
    Hori, Satoko
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [35] Applying natural language processing to electronic medical records for estimating healthy life expectancy Comment
    Weegar, Rebecka
    LANCET REGIONAL HEALTH-WESTERN PACIFIC, 2021, 9
  • [36] Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing
    Barbour, Kristen
    Hesdorffer, Dale C.
    Tian, Niu
    Yozawitz, Elissa G.
    McGoldrick, Patricia E.
    Wolf, Steven
    McDonough, Tiffani L.
    Nelson, Aaron
    Loddenkemper, Tobias
    Basma, Natasha
    Johnson, Stephen B.
    Grinspan, Zachary M.
    EPILEPSIA, 2019, 60 (06) : 1209 - 1220
  • [37] Using Natural Language Processing to Identify Different Lens Pathology in Electronic Health Records
    Stein, Joshua d.
    Zhou, Yunshu
    Andrews, Chris a.
    Kim, Judy e.
    Addis, Victoria
    Bixler, Jill
    Grove, Nathan
    Mcmillan, Brian
    Munir, Saleha z.
    Pershing, Suzann
    Schultz, Jeffrey s.
    Stagg, Brian c.
    Wang, Sophia y.
    Woreta, Fasika
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2024, 262 : 153 - 160
  • [38] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
    Fu, Sunyang
    Lopes, Guilherme S.
    Pagali, Sandeep R.
    Thorsteinsdottir, Bjoerg
    LeBrasseur, Nathan K.
    Wen, Andrew
    Liu, Hongfang
    Rocca, Walter A.
    Olson, Janet E.
    St Sauver, Jennifer
    Sohn, Sunghwan
    JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
  • [39] Classifying Firearm Injury Intent in Electronic Hospital Records Using Natural Language Processing
    MacPhaul, Erin
    Zhou, Li
    Mooney, Stephen J.
    Azrael, Deborah
    Bowen, Andrew
    Rowhani-Rahbar, Ali
    Yenduri, Ravali
    Barber, Catherine
    Goralnick, Eric
    Miller, Matthew
    JAMA NETWORK OPEN, 2023, 6 (04) : E235870
  • [40] Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis
    Hutto, Alissa
    Zikry, Tarek M.
    Bohac, Buck
    Rose, Terra
    Staebler, Jasmine
    Slay, Janet
    Cheever, C. Ray
    Kosorok, Michael R.
    Nash, Rebekah P.
    HEALTH INFORMATICS JOURNAL, 2024, 30 (04)