Development and validation of explainable machine-learning models for carotid atherosclerosis early screening

被引:4
|
作者
Yun, Ke [1 ,2 ]
He, Tao [3 ]
Zhen, Shi [4 ]
Quan, Meihui [1 ,2 ]
Yang, Xiaotao [1 ,2 ]
Man, Dongliang [1 ,2 ]
Zhang, Shuang [1 ,2 ]
Wang, Wei [5 ]
Han, Xiaoxu [1 ,2 ,6 ,7 ]
机构
[1] China Med Univ, Affiliated Hosp 1, Natl Clin Res Ctr Lab Med, Shenyang, Liaoning, Peoples R China
[2] China Med Univ, Affiliated Hosp 1, Dept Lab Med, Shenyang, Liaoning, Peoples R China
[3] Neusoft Corp, Neusoft Res Inst, Shenyang, Liaoning, Peoples R China
[4] Northeastern Univ, Dept Software Engn, Shenyang, Liaoning, Peoples R China
[5] China Med Univ, Affiliated Hosp 1, Dept Phys Examinat Ctr, Shenyang, Liaoning, Peoples R China
[6] Chinese Acad Med Sci, Lab Med Innovat Unit, Shenyang, Liaoning, Peoples R China
[7] China Med Univ, Affiliated Hosp 1, NHC Key Lab AIDS Immunol, Shenyang, Liaoning, Peoples R China
关键词
Machine learning; Carotid atherosclerosis; Explainable model; CHINESE ADULTS; RISK-FACTORS; PREVALENCE; ULTRASOUND; BURDEN; AGE; GENDER;
D O I
10.1186/s12967-023-04093-8
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundCarotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China.MethodsA total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model.ResultsA total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839-0.880) in the internal validation dataset and 0.851 (95% CI 0.837-0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol.ConclusionsThe ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Early prediction of noninvasive ventilation failure after extubation: development and validation of a machine-learning model
    Huan Wang
    Qin-Yu Zhao
    Jing-Chao Luo
    Kai Liu
    Shen-Ji Yu
    Jie-Fei Ma
    Ming-Hao Luo
    Guang-Wei Hao
    Ying Su
    Yi-Jie Zhang
    Guo-Wei Tu
    Zhe Luo
    BMC Pulmonary Medicine, 22
  • [22] Development of a machine-learning based voice disorder screening tool
    Reid, Jonathan
    Parmar, Preet
    Lund, Tyler
    Aalto, Daniel K.
    Jeffery, Caroline C.
    AMERICAN JOURNAL OF OTOLARYNGOLOGY, 2022, 43 (02)
  • [23] Certified Machine-Learning Models
    Damiani, Ernesto
    Ardagna, Claudio A.
    SOFSEM 2020: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2020, 12011 : 3 - 15
  • [24] Explainable machine learning models with privacy
    Bozorgpanah, Aso
    Torra, Vicenc
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, 13 (01) : 31 - 50
  • [25] Explainable machine learning models with privacy
    Aso Bozorgpanah
    Vicenç Torra
    Progress in Artificial Intelligence, 2024, 13 : 31 - 50
  • [26] Predicting Marshall stability and flow parameters in asphalt pavements using explainable machine-learning models
    Asi, Ibrahim
    Alhadidi, Yusra I.
    Alhadidi, Taqwa I.
    Transportation Engineering, 2024, 18
  • [27] Machine-Learning Classification Models to Predict Liver Cancer with Explainable AI to Discover Associated Genes
    Hasan, Md Easin
    Mostafa, Fahad
    Hossain, Md S.
    Loftin, Jonathon
    APPLIEDMATH, 2023, 3 (02): : 417 - 445
  • [28] Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models
    Janizek, Joseph D.
    Dincer, Ayse B.
    Celik, Safiye
    Chen, Hugh
    Chen, William
    Naxerova, Kamila
    Lee, Su-In
    NATURE BIOMEDICAL ENGINEERING, 2023, 7 (06) : 811 - +
  • [29] Uncovering expression signatures of synergistic drug responses via ensembles of explainable machine-learning models
    Joseph D. Janizek
    Ayse B. Dincer
    Safiye Celik
    Hugh Chen
    William Chen
    Kamila Naxerova
    Su-In Lee
    Nature Biomedical Engineering, 2023, 7 : 811 - 829
  • [30] Development and Validation of Machine-Learning Models to Support Clinical Diagnosis for Non-Epileptic Psychogenic Seizures
    Zucco, Chiara
    Calabrese, Barbara
    Mancuso, Rossana
    Sturniolo, Miriam
    Pucci, Franco
    Gambardella, Antonio
    Cannataro, Mario
    APPLIED SCIENCES-BASEL, 2023, 13 (12):