Using machine learning to improve anaphylaxis case identification in medical claims data

被引:0
|
作者
Kural, Kamil Can [1 ,2 ]
Mazo, Ilya [1 ]
Walderhaug, Mark [1 ]
Santana-Quintero, Luis [1 ]
Karagiannis, Konstantinos [1 ]
Thompson, Elaine E. [1 ]
Kelman, Jeffrey A. [3 ,4 ]
Goud, Ravi [1 ]
机构
[1] US FDA, Ctr Biol Evaluat & Res CBER, 10903 New Hampshire Ave, Silver Spring, MD 20993 USA
[2] George Mason Univ, Sch Syst Biol, Manassas, VA 20110 USA
[3] Ctr Medicare, Washington, DC 20001 USA
[4] Ctr Medicaid Serv, Washington, DC 20001 USA
关键词
anaphylaxis; machine learning; public health; allergy; electronic health records; Centers for Medicare & Medicaid Services; SUBSET;
D O I
10.1093/jamiaopen/ooae037
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: Anaphylaxis is a severe life-threatening allergic reaction, and its accurate identification in healthcare databases can harness the potential of "Big Data" for healthcare or public health purposes. Materials and methods: This study used claims data obtained between October 1, 2015 and February 28, 2019 from the CMS database to examine the utility of machine learning in identifying incident anaphylaxis cases. We created a feature selection pipeline to identify critical features between different datasets. Then a variety of unsupervised and supervised methods were used (eg, Sammon mapping and eXtreme Gradient Boosting) to train models on datasets of differing data quality, which reflects the varying availability and potential rarity of ground truth data in medical databases. Results: Resulting machine learning model accuracies ranged from 47.7% to 94.4% when tested on ground truth data. Finally, we found new features to help experts enhance existing case-finding algorithms. Discussion: Developing precise algorithms to detect medical outcomes in claims can be a laborious and expensive process, particularly for conditions presented and coded diversely. We found it beneficial to filter out highly potent codes used for data curation to identify underlying patterns and features. To improve rule-based algorithms where necessary, researchers could use model explainers to determine noteworthy features, which could then be shared with experts and included in the algorithm. Conclusion: Our work suggests machine learning models can perform at similar levels as a previously published expert case-finding algorithm, while also having the potential to improve performance or streamline algorithm construction processes by identifying new relevant features for algorithm construction. Lay Summary Electronic health records and medical claims data are a potential treasure trove for identifying the new underlying content and confirming the existing knowledge base. However, whenever researchers introduce screening criteria in the data curation process, they will also introduce bias if they are not careful. Therefore, it is crucial to consider what information can go into machine learning models. In this work, we show how we used feature elimination and feature selection to replicate the success of human expert-defined anaphylaxis identification models. We then used common and essential features between minimally curated and expert-defined datasets to create a new machine-learning model that can beat the human expert-defined algorithms. This process can be repeated and automated to iteratively develop better models and features, which can help healthcare practitioners design more successful case-defining algorithms.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Using machine learning to improve anaphylaxis case identification in medical claims data
    Kural, Kamil Can
    Mazo, Ilya
    Walderhaug, Mark
    Santana-Quintero, Luis
    Karagiannis, Konstantinos
    Thompson, Elaine E.
    Kelman, Jeffrey A.
    Goud, Ravi
    [J]. JAMIA OPEN, 2023, 6 (04)
  • [2] Development and validation of a machine learning algorithm to identify anaphylaxis in US administrative claims data
    Beachler, Daniel C.
    Taylor, Devon H.
    Anthony, Mary S.
    Yin, Ruihua
    Li, Ling
    Saltus, Catherine W.
    Li, Lin
    Shaunik, Alka
    Walsh, Kathleen E.
    Lanes, Stephan
    Rothman, Kenneth J.
    Johannes, Catherine
    Aroda, Vanita
    Carr, Warner
    Goldberg, Pinkus
    Accardi, Andrew
    O'Shura, J. Shane
    Sharma, Kristen
    Juhaeri, Juhaeri
    Wu, Chuntao
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 602 - 602
  • [3] Medical Device Identification in Claims Data
    Moscovitch, Ben
    Rising, Josh P.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (19): : 1936 - 1937
  • [4] Early detection of autism spectrum disorder in young children with machine learning using medical claims data
    Chen, Yu-Hsin
    Chen, Qiushi
    Kong, Lan
    Liu, Guodong
    [J]. BMJ HEALTH & CARE INFORMATICS, 2022, 29 (01)
  • [5] VALIDATING A MACHINE-LEARNING APPROACH TO CANCER STAGE IDENTIFICATION USING MEDICARE CLAIMS AND SEER DATA
    Smith, R.
    Miller-Wilson, L. A.
    Ho, N.
    Carter, Cuyun G.
    Fayyaz, I
    Pope, A.
    Pelizzari, P.
    Pyenson, B.
    [J]. VALUE IN HEALTH, 2023, 26 (06) : S283 - S283
  • [6] Supplementing Claims Data with Electronic Medical Records to Improve Estimation and Classification of Rheumatoid Arthritis Disease Activity: A Machine Learning Approach
    Feldman, Candace H.
    Yoshida, Kazuki
    Xu, Chang
    Frits, Michelle L.
    Shadick, Nancy A.
    Weinblatt, Michael E.
    Connolly, Sean E.
    Alemao, Evo
    Solomon, Daniel H.
    [J]. ACR OPEN RHEUMATOLOGY, 2019, 1 (09) : 552 - 559
  • [7] Estimating Prevalence, Demographics, and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning
    Valdez, Ashley R.
    Hancock, Elizabeth E.
    Adebayo, Seyi
    Kiernicki, David J.
    Proskauer, Daniel
    Attewell, John R.
    Bateman, Lucinda
    DeMaria, Alfred, Jr.
    Lapp, Charles W.
    Rowe, Peter C.
    Proskauer, Charmian
    [J]. FRONTIERS IN PEDIATRICS, 2019, 6
  • [8] DEVELOPMENT OF MEDICAL COST PREDICTION MODEL BASED ON STATISTICAL MACHINE LEARNING USING HEALTH INSURANCE CLAIMS DATA
    Takeshima, T.
    Keino, S.
    Aoki, R.
    Matsui, T.
    Iwasaki, K.
    [J]. VALUE IN HEALTH, 2018, 21 : S97 - S97
  • [9] Identifying inpatient mortality in MarketScan claims data using machine learning
    Xie, Fenglong
    Beukelman, Timothy
    Sun, Dongmei
    Yun, Huifeng
    Curtis, Jeffrey R.
    [J]. PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2023, 32 (11) : 1299 - 1305
  • [10] Medical Device Identification in Claims Data Reply
    Ibrahim, Andrew M.
    Dimick, Justin B.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (19): : 1937 - 1937