Prediction of Venous Thromboembolism in Diverse Populations Using Machine Learning and Structured Electronic Health Records

被引:5
|
作者
Chen, Robert [1 ,2 ,3 ]
Petrazzini, Ben Omega [1 ,3 ,4 ]
Malick, Waqas A. [5 ]
Rosenson, Robert S. [5 ]
Do, Ron [1 ,3 ,4 ,6 ]
机构
[1] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
[2] Icahn Sch Med Mt Sinai, Med Scientist Training Program, New York, NY USA
[3] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY USA
[4] Icahn Sch Med Mt Sinai, Ctr Genom Data Analyt, New York, NY USA
[5] Icahn Sch Med Mt Sinai, Zena & Michael A Wiener Cardiovasc Inst, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Room 80B, Floor 18, Annenberg Bldg, 1468 Madison A, New York, NY 10029 USA
基金
美国国家卫生研究院;
关键词
machine learning; medical records; morbidity; risk assessment; thrombosis; CLINICAL DECISION-SUPPORT; CELL DISTRIBUTION WIDTH; PULMONARY-EMBOLISM; MEDICAL PATIENTS; VIENNA CANCER; RISK-FACTORS; THROMBOSIS; THROMBOPROPHYLAXIS; ACCURACY; EVENTS;
D O I
10.1161/ATVBAHA.123.320331
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BACKGROUND: Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE. METHODS: We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6. RESULTS: In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology. CONCLUSIONS: Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE. [GRAPHICS] .
引用
收藏
页码:491 / 504
页数:14
相关论文
共 50 条
  • [31] Personalized Risk Prediction for 30-Day Readmissions With Venous Thromboembolism Using Machine Learning
    Park, Jung In
    Kim, Doyub
    Lee, Jung-Ah
    Zheng, Kai
    Amin, Alpesh
    [J]. JOURNAL OF NURSING SCHOLARSHIP, 2021, 53 (03) : 278 - 287
  • [32] Predicting the Risk of Inpatient Hypoglycemia With Machine Learning Using Electronic Health Records
    Ruan, Yue
    Bellot, Alexis
    Moysova, Zuzana
    Tan, Garry D.
    Lumb, Alistair
    Davies, Jim
    van der Schaar, Mihaela
    Rea, Rustam
    [J]. DIABETES CARE, 2020, 43 (07) : 1504 - 1511
  • [33] Using Electronic Health Records and Machine Learning to Predict Incident Psychiatric Hospitalization
    DeFerio, Joseph
    Banerjee, Samprit
    Alexopoulos, George
    Pathak, Jyotishman
    [J]. BIOLOGICAL PSYCHIATRY, 2020, 87 (09) : S68 - S69
  • [34] Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records
    Siordia-Millan, Sinhue
    Torres-Ramos, Sulema
    Salido-Ruiz, Ricardo A.
    Hernandez-Gordillo, Daniel
    Perez-Gutierrez, Tracy
    Roman-Godinez, Israel
    [J]. DIAGNOSTICS, 2022, 12 (10)
  • [35] Treatment effect prediction with adversarial deep learning using electronic health records
    Jiebin Chu
    Wei Dong
    Jinliang Wang
    Kunlun He
    Zhengxing Huang
    [J]. BMC Medical Informatics and Decision Making, 20
  • [36] Treatment effect prediction with adversarial deep learning using electronic health records
    Chu, Jiebin
    Dong, Wei
    Wang, Jinliang
    He, Kunlun
    Huang, Zhengxing
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 4)
  • [37] Deep Learning Prediction of Mild Cognitive Impairment using Electronic Health Records
    Fouladvand, Sajjad
    Mielke, Michelle M.
    Vassilaki, Maria
    St Sauver, Jennifer
    Petersen, Ronald C.
    Sohn, Sunghwan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 799 - 806
  • [38] Prediction of Accuracy in Emergency Health Records using Hybrid Machine Learning Model
    Raghavendra, G. S.
    Mahesh, Shanthi
    Rao, M. V. P. Chandra Sekhara
    [J]. JOURNAL OF PHARMACEUTICAL RESEARCH INTERNATIONAL, 2021, 33 (58A) : 206 - 212
  • [39] Machine Learning for Prediction in Electronic Health Data
    Rose, Sherri
    [J]. JAMA NETWORK OPEN, 2018, 1 (04)
  • [40] Data Leakage in Health Outcomes Prediction With Machine Learning. Comment on "Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning"
    Chiavegatto Filho, Alexandre
    De Moraes Batista, Andre Filipe
    dos Santos, Hellen Geremias
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (02)