A machine learning-based diabetes risk prediction modeling study

被引:0
|
作者
Ming, Jiexiu [1 ]
Xu, Junyi [1 ]
Zhang, Miaomiao [1 ]
Li, Ningyu [1 ]
Yan, Xu [2 ]
机构
[1] Wuhan Donghu Univ, Wuhan 430212, Hubei, Peoples R China
[2] Wuhan Inst Technol, Univ Hosp, Wuhan 430205, Hubei, Peoples R China
关键词
Factor analysis; Machine learning; Bayesian optimization; Support vector machine regression; SVR;
D O I
10.1145/3675249.3675313
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a chronic metabolic disease, mainly characterized by insufficient insulin secretion or impaired insulin action in the body, resulting in elevated blood glucose. According to the World Health Organization (WHO), the number of diabetes patients worldwide has been on the rise in recent years, and has become an important public health problem worldwide today. In this paper, we used the Random Forest-based feature importance screening method to retain the variables with larger variable feature weights, performed Spearman correlation analysis, selected the top 10 operational variables with lower correlations, and used information entropy theory and correlation analysis to test the representativeness and independence of the main variables, and finally screened out the main variables as platelet volume distribution width, HDL cholesterol, and the proportion of white globules, platelet specific volume, platelet count, red blood cell count, lymphocyte %, albumin, neutrophil %, and leukocyte count. Blood glucose prediction models were established through data mining techniques, in this paper five machine learning were selected for prediction, namely Extreme Gradient Boosted Tree (XGBoost), Random Forest Regression, Support Vector Machine Regression SVR, LightGBM, Gradient Boosted Decision Tree (GBDT). The training set was put into each model for training, and the test set was inputted into the model to get the root mean squared error produced by the five models ( MSE), Mean Absolute Error (MAE), and Maximum Absolute Error (MAS), comparing the five models, in general, the Support Vector Machine regression SVR has the highest accuracy. To establish a support vector machine SVR blood sugar prediction model based on Bayesian optimization, the sample data are normalized, the parameters are initially corrected using Bayesian principles, and then the support vector machine estimation algorithm is selected to initialize the model, the parameters are inferred using the Bayesian evidence framework, and the optimal model is established after several iterations, and the support vector machine regression SVR trained using the optimal hyperparameters obtained from Bayesian optimization model has improved accuracy in all three evaluation metrics.
引用
收藏
页码:363 / 369
页数:7
相关论文
共 50 条
  • [1] Machine learning-based prediction of diabetes risk by combining exposome and electrocardiographic predictors
    Shahbazi, Zeinab
    Camacho, Marina
    Ruiz, Esmeralda
    Atehortua, Angelica
    Lekadir, Karim
    [J]. 18TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2023, 12567
  • [2] Machine learning-based assessment of diabetes risk: Machine learning-based assessment of diabetes risk: Q. Sun et al.
    Sun, Qi
    Cheng, Xin
    Han, Kuo
    Sun, Yichao
    Ren, He
    Li, Ping
    [J]. Applied Intelligence, 2025, 55 (02)
  • [3] Machine Learning-Based Predictive Modeling of Complications of Chronic Diabetes
    Derevitskii, Ilia, V
    Kovalchuk, Sergey, V
    [J]. 9TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE, YSC2020, 2020, 178 : 274 - 283
  • [4] Machine Learning-Based Risk Stratification for Gestational Diabetes Management
    Yang, Jenny
    Clifton, David
    Hirst, Jane E.
    Kavvoura, Foteini K.
    Farah, George
    Mackillop, Lucy
    Lu, Huiqi
    [J]. SENSORS, 2022, 22 (13)
  • [5] A machine learning-based universal outbreak risk prediction tool
    Zhang, Tianyu
    Rabhi, Fethi
    Chen, Xin
    Paik, Hye-young
    Macintyre, Chandini Raina
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [6] Machine Learning-Based Risk Prediction of Discharge Status for Sepsis
    Cai, Kaida
    Lou, Yuqing
    Wang, Zhengyan
    Yang, Xiaofang
    Zhao, Xin
    [J]. ENTROPY, 2024, 26 (08)
  • [7] Machine learning-based reproducible prediction of type 2 diabetes subtypes
    Tanabe, Hayato
    Sato, Masahiro
    Miyake, Akimitsu
    Shimajiri, Yoshinori
    Ojima, Takafumi
    Narita, Akira
    Saito, Haruka
    Tanaka, Kenichi
    Masuzaki, Hiroaki
    Kazama, Junichiro J.
    Katagiri, Hideki
    Tamiya, Gen
    Kawakami, Eiryo
    Shimabukuro, Michio
    [J]. DIABETOLOGIA, 2024,
  • [8] Machine Learning-Based Coupling Modeling and Prediction for Multiple Transmission lines
    Wu, Xiaolin
    Ji, Junling
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY & SIGNAL/POWER INTEGRITY, EMCSI, 2022, : 615 - 618
  • [9] Machine learning-based prediction of transfusion
    Mitterecker, Andreas
    Hofmann, Axel
    Trentino, Kevin M.
    Lloyd, Adam
    Leahy, Michael F.
    Schwarzbauer, Karin
    Tschoellitsch, Thomas
    Boeck, Carl
    Hochreiter, Sepp
    Meier, Jens
    [J]. TRANSFUSION, 2020, 60 (09) : 1977 - 1986
  • [10] Clinical evaluation of a machine learning-based dysphagia risk prediction tool
    Gugatschka, Markus
    Egger, Nina Maria
    Haspl, K.
    Hortobagyi, David
    Jauk, Stefanie
    Feiner, Marlies
    Kramer, Diether
    [J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (08) : 4379 - 4384