Machine learning to identify chronic cough from administrative claims data

被引:0
|
作者
Bali, Vishal [1 ]
Turzhitsky, Vladimir [1 ]
Schelfhout, Jonathan [1 ]
Paudel, Misti [2 ]
Hulbert, Erin [2 ]
Peterson-Brandt, Jesse [2 ]
Hertzberg, Jeffrey [3 ]
Kelly, Neal R. [3 ]
Patel, Raja H. [3 ]
机构
[1] Merck & Co Inc, Ctr Observat & Real World Evidence CORE, Rahway, NJ 07065 USA
[2] Optum Insight, Hlth Econ & Outcomes Res HEOR, Eden Prairie, MN USA
[3] OptumLabs, Minnetonka, MN USA
关键词
HEART-FAILURE; DIAGNOSIS; MODELS; ADULTS;
D O I
10.1038/s41598-024-51522-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate identification of patient populations is an essential component of clinical research, especially for medical conditions such as chronic cough that are inconsistently defined and diagnosed. We aimed to develop and compare machine learning models to identify chronic cough from medical and pharmacy claims data. In this retrospective observational study, we compared 3 machine learning algorithms based on XG Boost, logistic regression, and neural network approaches using a large claims and electronic health record database. Of the 327,423 patients who met the study criteria, 4,818 had chronic cough based on linked claims-electronic health record data. The XG Boost model showed the best performance, achieving a Receiver-Operator Characteristic Area Under the Curve (ROC-AUC) of 0.916. We selected a cutoff that favors a high positive predictive value (PPV) to minimize false positives, resulting in a sensitivity, specificity, PPV, and negative predictive value of 18.0%, 99.6%, 38.7%, and 98.8%, respectively on the held-out testing set (n = 82,262). Logistic regression and neural network models achieved slightly lower ROC-AUCs of 0.907 and 0.838, respectively. The XG Boost and logistic regression models maintained their robust performance in subgroups of individuals with higher rates of chronic cough. Machine learning algorithms are one way of identifying conditions that are not coded in medical records, and can help identify individuals with chronic cough from claims data with a high degree of classification value.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Leveraging Linkage of Cohort Studies With Administrative Claims Data to Identify Individuals With Cancer
    Bronson, Mackenzie R.
    Kapadia, Nirav S.
    Austin, Andrea M.
    Wang, Qianfei
    Feskanich, Diane
    Bynum, Julie P. W.
    Grodstein, Francine
    Tosteson, Anna N. A.
    [J]. MEDICAL CARE, 2018, 56 (12) : e83 - e89
  • [22] Development of an Algorithm to Identify Patients with Multiple Myeloma Using Administrative Claims Data
    Princic, Nicole
    Gregory, Chris
    Willson, Tina
    Mahue, Maya
    Felici, Diana
    Werther, Winifred
    Lenhart, Gregory
    Foley, Kathy
    [J]. BLOOD, 2015, 126 (23)
  • [23] The Design and Validation of a New Algorithm to Identify Incident Fractures in Administrative Claims Data
    Wright, Nicole C.
    Daigle, Shanette G.
    Melton, Mary E.
    Delzell, Elizabeth S.
    Balasubramanian, Akhila
    Curtis, Jeffrey R.
    [J]. JOURNAL OF BONE AND MINERAL RESEARCH, 2019, 34 (10) : 1798 - 1807
  • [24] Development and Validation of an Algorithm to Identify Endometrial Adenocarcinoma in US Administrative Claims Data
    Esposito, Daina B.
    Yin, Ruihua
    Russo, Leo J.
    del Carmen, Marcela G.
    Goldstein, Steven R.
    Patsner, Bruce
    Lanes, Stephan F.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 159 - 160
  • [25] AN ALGORITHM TO IDENTIFY SUICIDAL BEHAVIOR AMONG ADOLESCENTS USING ADMINISTRATIVE CLAIMS DATA
    Callahan, S. Todd
    Cooper, William O.
    [J]. JOURNAL OF ADOLESCENT HEALTH, 2012, 50 (02) : S50 - S51
  • [26] Development and Validation of an Algorithm to Identify Endometrial Adenocarcinoma in US Administrative Claims Data
    Esposito, D. B.
    Banerjee, G.
    Yin, R.
    Russo, L.
    Goldstein, S.
    Patsner, B.
    Lanes, S.
    [J]. JOURNAL OF CANCER EPIDEMIOLOGY, 2019, 2019
  • [27] Development and Validation of an Algorithm to Identify Endometrial Hyperplasia in US Administrative Claims Data
    Esposito, Daina B.
    Yin, Ruihua
    Russo, Leo J.
    Ridgeway, Gregory
    Finkle, William J.
    Goldstein, Steven R.
    Mittal, Khushbakhat
    Walsh, Brian W.
    Lanes, Stephan F.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 160 - 160
  • [28] Application of machine learning approaches to administrative claims data to predict clinical outcomes in medical and surgical patient populations
    MacKay, Emily J.
    Stubna, Michael D.
    Chivers, Corey
    Draugelis, Michael E.
    Hanson, William J.
    Desai, Nimesh D.
    Groeneveld, Peter W.
    [J]. PLOS ONE, 2021, 16 (06):
  • [29] Determination of Colonoscopy Indication From Administrative Claims Data
    Ko, Cynthia W.
    Dominitz, Jason A.
    Neradilek, Moni
    Polissar, Nayak
    Green, Pam
    Kreuter, William
    Baldwin, Laura-Mae
    [J]. MEDICAL CARE, 2014, 52 (04) : E21 - E29
  • [30] Creating a computerized database from administrative claims data
    Piecoro, LT
    Wang, LS
    Dixon, WS
    Crovo, RJ
    [J]. AMERICAN JOURNAL OF HEALTH-SYSTEM PHARMACY, 1999, 56 (13) : 1326 - 1329