Prediction of Venous Thromboembolism in Diverse Populations Using Machine Learning and Structured Electronic Health Records

被引:5
|
作者
Chen, Robert [1 ,2 ,3 ]
Petrazzini, Ben Omega [1 ,3 ,4 ]
Malick, Waqas A. [5 ]
Rosenson, Robert S. [5 ]
Do, Ron [1 ,3 ,4 ,6 ]
机构
[1] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
[2] Icahn Sch Med Mt Sinai, Med Scientist Training Program, New York, NY USA
[3] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY USA
[4] Icahn Sch Med Mt Sinai, Ctr Genom Data Analyt, New York, NY USA
[5] Icahn Sch Med Mt Sinai, Zena & Michael A Wiener Cardiovasc Inst, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Room 80B, Floor 18, Annenberg Bldg, 1468 Madison A, New York, NY 10029 USA
基金
美国国家卫生研究院;
关键词
machine learning; medical records; morbidity; risk assessment; thrombosis; CLINICAL DECISION-SUPPORT; CELL DISTRIBUTION WIDTH; PULMONARY-EMBOLISM; MEDICAL PATIENTS; VIENNA CANCER; RISK-FACTORS; THROMBOSIS; THROMBOPROPHYLAXIS; ACCURACY; EVENTS;
D O I
10.1161/ATVBAHA.123.320331
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
BACKGROUND: Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE. METHODS: We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6. RESULTS: In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology. CONCLUSIONS: Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE. [GRAPHICS] .
引用
收藏
页码:491 / 504
页数:14
相关论文
共 50 条
  • [1] Individualized melanoma risk prediction using machine learning with electronic health records
    Wan, G.
    Nguyen, N.
    Yan, B.
    Khattab, S.
    Estiri, H.
    Semenov, Y.
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S35 - S35
  • [2] Early Prediction of Gestational Diabetes Mellitus Using Electronic Health Records and Machine Learning
    Germaine, Mark A.
    O'Higgins, Amy C.
    Healy, Graham
    Egan, Brendan
    [J]. DIABETES, 2024, 73
  • [3] Delirium Prediction using Machine Learning Models on Preoperative Electronic Health Records Data
    Davoudi, Anis
    Ebadi, Ashkan
    Rashidi, Parisa
    Ozrazgat-Baslanti, Tazcan
    Bihorac, Azra
    Bursian, Alberto C.
    [J]. 2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 568 - 573
  • [4] Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review
    Nickson, David
    Meyer, Caroline
    Walasek, Lukasz
    Toro, Carla
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [5] Prediction of Endometrial Carcinoma Using the Combination of Electronic Health Records and an Ensemble Machine Learning Method
    Wang, Wenwen
    Xu, Yang
    Yuan, Suzhen
    Li, Zhiying
    Zhu, Xin
    Zhou, Qin
    Shen, Wenfeng
    Wang, Shixuan
    [J]. FRONTIERS IN MEDICINE, 2022, 9
  • [6] Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review
    David Nickson
    Caroline Meyer
    Lukasz Walasek
    Carla Toro
    [J]. BMC Medical Informatics and Decision Making, 23
  • [7] Dynamic Delirium Prediction in the Intensive Care Unit using Machine Learning on Electronic Health Records
    Contreras, Miguel
    Silva, Brandon
    Shickel, Benjamin
    Bandyopadhyay, Sabyasachi
    Guan, Ziyuan
    Ren, Yuanfang
    Ozrazgat-Baslanti, Tezcan
    Khezeli, Kia
    Bihorac, Azra
    Rashidi, Parisa
    [J]. 2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
  • [8] Subphenotyping depression using machine learning and electronic health records
    Xu, Zhenxing
    Wang, Fei
    Adekkanattu, Prakash
    Bose, Budhaditya
    Vekaria, Veer
    Brandt, Pascal
    Jiang, Guoqian
    Kiefer, Richard C.
    Luo, Yuan
    Pacheco, Jennifer A.
    Rasmussen, Luke V.
    Xu, Jie
    Alexopoulos, George
    Pathak, Jyotishman
    [J]. LEARNING HEALTH SYSTEMS, 2020, 4 (04):
  • [9] Using machine learning in the prediction of symptomatic venous thromboembolism following ankle fracture
    Nassour, Nour
    Akhbari, Bardiya
    Ranganathan, Noopur
    Shin, David
    Ghaednia, Hamid
    Ashkani-Esfahani, Soheil
    Digiovanni, Christopher W.
    Guss, Daniel
    [J]. FOOT AND ANKLE SURGERY, 2024, 30 (02) : 110 - 116
  • [10] Melanoma Risk Prediction with Structured Electronic Health Records
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    [J]. ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 194 - 199