Machine learning-based prediction of diabetic patients using blood routine data

被引:0
|
作者
Li, Honghao [1 ]
Su, Dongqing [1 ]
Zhang, Xinpeng [1 ]
He, Yuanyuan [1 ]
Luo, Xu [1 ]
Xiong, Yuqiang [1 ]
Zou, Min [1 ]
Wei, Huiyan [2 ]
Wen, Shaoran [3 ]
Xi, Qilemuge [3 ]
Zuo, Yongchun [3 ,4 ]
Yang, Lei [1 ]
机构
[1] Harbin Med Univ, Coll Bioinformat Sci & Technol, Harbin 150081, Peoples R China
[2] Harbin Med Univ, Biotechnol Expt Ctr, Harbin 150081, Peoples R China
[3] Inner Mongolia Univ, Coll Life Sci, State Key Lab Reprod Regulat & Breeding Grassland, Hohhot 010070, Peoples R China
[4] Inner Mongolia Int Mongolian Hosp, Hohhot 010065, Peoples R China
关键词
Diabetes; Blood routine test; Machine learning; Nomogram;
D O I
10.1016/j.ymeth.2024.07.001
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Diabetes stands as one of the most prevalent chronic diseases globally. The conventional methods for diagnosing diabetes are frequently overlooked until individuals manifest noticeable symptoms of the condition. This study aimed to address this gap by collecting comprehensive datasets, including 1000 instances of blood routine data from diabetes patients and an equivalent dataset from healthy individuals. To differentiate diabetes patients from their healthy counterparts, a computational framework was established, encompassing eXtreme Gradient Boosting (XGBoost), random forest, support vector machine, and elastic net algorithms. Notably, the XGBoost model emerged as the most effective, exhibiting superior predictive results with an area under the receiver operating characteristic curve (AUC) of 99.90% in the training set and 98.51% in the testing set. Moreover, the model showcased commendable performance during external validation, achieving an overall accuracy of 81.54%. The probability generated by the model serves as a risk score for diabetes susceptibility. Further interpretability was achieved through the utilization of the Shapley additive explanations (SHAP) algorithm, identifying pivotal indicators such as mean corpuscular hemoglobin concentration (MCHC), lymphocyte ratio (LY%), standard deviation of red blood cell distribution width (RDW-SD), and mean corpuscular hemoglobin (MCH). This enhances our understanding of the predictive mechanisms underlying diabetes. To facilitate the application in clinical and real-life settings, a nomogram was created based on the logistic regression algorithm, which can provide a preliminary assessment of the likelihood of an individual having diabetes. Overall, this research contributes valuable insights into the predictive modeling of diabetes, offering potential applications in clinical practice for more effective and timely diagnoses.
引用
收藏
页码:156 / 162
页数:7
相关论文
共 50 条
  • [1] A machine learning-based approach for the prediction of periprocedural myocardial infarction by using routine data
    Wang, Yao
    Zhu, Kangjun
    Li, Ya
    Lv, Qingbo
    Fu, Guosheng
    Zhang, Wenbin
    [J]. CARDIOVASCULAR DIAGNOSIS AND THERAPY, 2020, 10 (05) : 1313 - 1324
  • [2] Machine Learning-Based Prediction of Hemoglobinopathies Using Complete Blood Count Data
    Schipper, Anoeska
    Rutten, Matthieu
    van Gammeren, Adriaan
    Harteveld, Cornelis L.
    Urrechaga, Eloisa
    Weerkamp, Floor
    den Besten, Gijs
    Krabbe, Johannes
    Slomp, Jennichjen
    Schoonen, Lise
    Broeren, Maarten
    van Wijnen, Merel
    Huijskens, Mirelle J. A. J.
    Koopmann, Tamara
    van Ginneken, Bram
    Kusters, Ron
    Kurstjens, Steef
    [J]. CLINICAL CHEMISTRY, 2024, 70 (08) : 1064 - 1075
  • [3] Machine learning-based prediction of relapse in rheumatoid arthritis patients using data on ultrasound examination and blood test
    Hidemasa Matsuo
    Mayumi Kamada
    Akari Imamura
    Madoka Shimizu
    Maiko Inagaki
    Yuko Tsuji
    Motomu Hashimoto
    Masao Tanaka
    Hiromu Ito
    Yasutomo Fujii
    [J]. Scientific Reports, 12
  • [4] Machine learning-based prediction of relapse in rheumatoid arthritis patients using data on ultrasound examination and blood test
    Matsuo, Hidemasa
    Kamada, Mayumi
    Imamura, Akari
    Shimizu, Madoka
    Inagaki, Maiko
    Tsuji, Yuko
    Hashimoto, Motomu
    Tanaka, Masao
    Ito, Hiromu
    Fujii, Yasutomo
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [5] Machine learning-based approaches for cancer prediction using microbiome data
    Freitas, Pedro
    Silva, Francisco
    Sousa, Joana Vale
    Ferreira, Rui M.
    Figueiredo, Ceu
    Pereira, Tania
    Oliveira, Helder P.
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01):
  • [6] Machine learning-based approaches for cancer prediction using microbiome data
    Pedro Freitas
    Francisco Silva
    Joana Vale Sousa
    Rui M. Ferreira
    Céu Figueiredo
    Tania Pereira
    Hélder P. Oliveira
    [J]. Scientific Reports, 13 (1)
  • [7] Explainable Machine Learning-Based Prediction Model for Diabetic Nephropathy
    Yin, Jing-Mei
    Li, Yang
    Xue, Jun-Tang
    Zong, Guo-Wei
    Fang, Zhong-Ze
    Zou, Lang
    [J]. JOURNAL OF DIABETES RESEARCH, 2024, 2024
  • [8] Machine Learning-Based Prediction of Diabetic Kidney Disease in Patients with Type 2 Diabetes
    Park, Tae Sun
    Kim, Yu Ji
    Lee, Kyung Ae
    [J]. DIABETES, 2024, 73
  • [9] Blood Glucose Prediction for Type 2 Diabetic Patients Using Machine Learning
    Kim, Daeyeon
    Lee, Han-Beom
    Kim, Yeojoo
    Kim, Sang Jin
    Lee, Sang-Jeong
    Chun, Sung Wan
    [J]. DIABETES, 2019, 68
  • [10] Machine Learning-Based Prediction of Cattle Activity Using Sensor-Based Data
    Hernandez, Guillermo
    Gonzalez-Sanchez, Carlos
    Gonzalez-Arrieta, Angelica
    Sanchez-Brizuela, Guillermo
    Fraile, Juan-Carlos
    [J]. SENSORS, 2024, 24 (10)