A machine learning-based diabetes risk prediction modeling study

被引：0

作者：

Ming, Jiexiu ^{[1
]}

Xu, Junyi ^{[1
]}

Zhang, Miaomiao ^{[1
]}

Li, Ningyu ^{[1
]}

Yan, Xu ^{[2
]}

机构：

[1] Wuhan Donghu Univ, Wuhan 430212, Hubei, Peoples R China

[2] Wuhan Inst Technol, Univ Hosp, Wuhan 430205, Hubei, Peoples R China

来源：

PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024 | 2024年

关键词：

Factor analysis; Machine learning; Bayesian optimization; Support vector machine regression; SVR;

D O I：

10.1145/3675249.3675313

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Diabetes mellitus is a chronic metabolic disease, mainly characterized by insufficient insulin secretion or impaired insulin action in the body, resulting in elevated blood glucose. According to the World Health Organization (WHO), the number of diabetes patients worldwide has been on the rise in recent years, and has become an important public health problem worldwide today. In this paper, we used the Random Forest-based feature importance screening method to retain the variables with larger variable feature weights, performed Spearman correlation analysis, selected the top 10 operational variables with lower correlations, and used information entropy theory and correlation analysis to test the representativeness and independence of the main variables, and finally screened out the main variables as platelet volume distribution width, HDL cholesterol, and the proportion of white globules, platelet specific volume, platelet count, red blood cell count, lymphocyte %, albumin, neutrophil %, and leukocyte count. Blood glucose prediction models were established through data mining techniques, in this paper five machine learning were selected for prediction, namely Extreme Gradient Boosted Tree (XGBoost), Random Forest Regression, Support Vector Machine Regression SVR, LightGBM, Gradient Boosted Decision Tree (GBDT). The training set was put into each model for training, and the test set was inputted into the model to get the root mean squared error produced by the five models ( MSE), Mean Absolute Error (MAE), and Maximum Absolute Error (MAS), comparing the five models, in general, the Support Vector Machine regression SVR has the highest accuracy. To establish a support vector machine SVR blood sugar prediction model based on Bayesian optimization, the sample data are normalized, the parameters are initially corrected using Bayesian principles, and then the support vector machine estimation algorithm is selected to initialize the model, the parameters are inferred using the Bayesian evidence framework, and the optimal model is established after several iterations, and the support vector machine regression SVR trained using the optimal hyperparameters obtained from Bayesian optimization model has improved accuracy in all three evaluation metrics.

引用

页码：363 / 369

页数：7

共 50 条

[1] Machine learning-based prediction of diabetes risk by combining exposome and electrocardiographic predictors
Shahbazi, Zeinab
Camacho, Marina
Ruiz, Esmeralda
Atehortua, Angelica
Lekadir, Karim
[J]. 18TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2023, 12567
[2] Machine learning-based assessment of diabetes risk: Machine learning-based assessment of diabetes risk: Q. Sun et al.
Sun, Qi
Cheng, Xin
Han, Kuo
Sun, Yichao
Ren, He
Li, Ping
[J]. Applied Intelligence, 2025, 55 (02)
[3] Machine Learning-Based Predictive Modeling of Complications of Chronic Diabetes
Derevitskii, Ilia, V
Kovalchuk, Sergey, V
[J]. 9TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE, YSC2020, 2020, 178 : 274 - 283
[4] Machine Learning-Based Risk Stratification for Gestational Diabetes Management
Yang, Jenny
Clifton, David
Hirst, Jane E.
Kavvoura, Foteini K.
Farah, George
Mackillop, Lucy
Lu, Huiqi
[J]. SENSORS, 2022, 22 (13)
[5] A machine learning-based universal outbreak risk prediction tool
Zhang, Tianyu
Rabhi, Fethi
Chen, Xin
Paik, Hye-young
Macintyre, Chandini Raina
[J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
[6] Machine Learning-Based Risk Prediction of Discharge Status for Sepsis
Cai, Kaida
Lou, Yuqing
Wang, Zhengyan
Yang, Xiaofang
Zhao, Xin
[J]. ENTROPY, 2024, 26 (08)
[7] Machine learning-based reproducible prediction of type 2 diabetes subtypes
Tanabe, Hayato
Sato, Masahiro
Miyake, Akimitsu
Shimajiri, Yoshinori
Ojima, Takafumi
Narita, Akira
Saito, Haruka
Tanaka, Kenichi
Masuzaki, Hiroaki
Kazama, Junichiro J.
Katagiri, Hideki
Tamiya, Gen
Kawakami, Eiryo
Shimabukuro, Michio
[J]. DIABETOLOGIA, 2024,
[8] Machine Learning-Based Coupling Modeling and Prediction for Multiple Transmission lines
Wu, Xiaolin
Ji, Junling
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY & SIGNAL/POWER INTEGRITY, EMCSI, 2022, : 615 - 618
[9] Machine learning-based prediction of transfusion
Mitterecker, Andreas
Hofmann, Axel
Trentino, Kevin M.
Lloyd, Adam
Leahy, Michael F.
Schwarzbauer, Karin
Tschoellitsch, Thomas
Boeck, Carl
Hochreiter, Sepp
Meier, Jens
[J]. TRANSFUSION, 2020, 60 (09) : 1977 - 1986
[10] Clinical evaluation of a machine learning-based dysphagia risk prediction tool
Gugatschka, Markus
Egger, Nina Maria
Haspl, K.
Hortobagyi, David
Jauk, Stefanie
Feiner, Marlies
Kramer, Diether
[J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (08) : 4379 - 4384

← 1 2 3 4 5 →