Diabetes Prediction Using Derived Features and Ensembling of Boosting Classifiers

被引：3

作者：

Rajkamal, R. ^{[1
]}

Karthi, Anitha ^{[2
]}

Gao, Xiao-Zhi ^{[3
]}

机构：

[1] SRM Inst Sci & Technol, Sch Comp, Chennai, Tamil Nadu, India

[2] Bharat Inst Higher Educ & Res, Sch Comp, Chennai, Tamil Nadu, India

[3] Univ Eastern Finland, Sch Comp, Kuopio, Finland

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 73卷 / 01期

关键词：

Diabetes prediction; feature engineering; highly informative features; ML models; ensembling models; MISSING-DATA;

D O I：

10.32604/cmc.2022.027142

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Diabetes is increasing commonly in people???s daily life and represents an extraordinary threat to human well-being. Machine Learning (ML) in the healthcare industry has recently made headlines. Several ML models are developed around different datasets for diabetic prediction. It is essential for ML models to predict diabetes accurately. Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes. Feature engineering (FE) is the way of taking forward in yielding highly informative features. Pima Indian Diabetes Dataset (PIDD) is used in this work, and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes. Missing values (MV) and the effect of the imputation process in the data distribution of each feature are analyzed. Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose (GLUC), Body Mass Index (BMI), and Insulin (INS) are highly informative features. Derived features are obtained for BMI and INS to add more information with its raw form. The ensemble classifier with an ensemble of AdaBoost (AB) and XGBoost (XB) is considered for the impact analysis of the proposed FE approach. The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio (DOR) of 117.694. This shows a high margin of 8.2% when compared with the ensemble model with no derived features (DOR = 96.306) included in the experiment. The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity (0.793), Specificity (0.945), DOR (79.517), and False Omission Rate (0.090) which further improves the state-of-the-art results.

引用

页码：2013 / 2033

页数：21

共 50 条

[31] Prediction and Prioritization of Rare Oncogenic Mutations in the Cancer Kinome Using Novel Features and Multiple Classifiers
ManChon, U.
Talevich, Eric
Katiyar, Samiksha
Rasheed, Khaled
Kannan, Natarajan
[J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (04)
[32] Improving the Performance of Heart Disease Prediction System Using Ensembling Techniques
Maini, Ekta
Venkateswarlu, Bondu
[J]. ADVANCED TRENDS IN MECHANICAL AND AEROSPACE ENGINEERING (ATMA-2019), 2021, 2316
[33] Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques
Ahamed, B. Shamreen
Arya, Meenakshi Sumeet
Nancy, V. Auxilia Osvin
[J]. FRONTIERS IN COMPUTER SCIENCE, 2022, 4
[34] An Intelligent Forecasting Model for Disease Prediction Using Stack Ensembling Approach
Verma, Shobhit
Sharma, Nonita
Singh, Aman
Alharbi, Abdullah
Alosaimi, Wael
Alyami, Hashem
Gupta, Deepali
Goyal, Nitin
[J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 6041 - 6055
[35] Boosting power for clinical trials using classifiers based on multiple biomarkers
Kohannim, Omid
Hua, Xue
Hibar, Derrek P.
Lee, Suh
Chou, Yi-Yu
Toga, Arthur W.
Jack, Clifford R., Jr.
Weiner, Michael W.
Thompson, Paul M.
[J]. NEUROBIOLOGY OF AGING, 2010, 31 (08) : 1429 - 1442
[36] Object detection using ensemble of linear classifiers with fuzzy adaptive boosting
Kisang Kim
Hyung-Il Choi
Kyoungsu Oh
[J]. EURASIP Journal on Image and Video Processing, 2017
[37] Object detection using ensemble of linear classifiers with fuzzy adaptive boosting
Kim, Kisang
Choi, Hyung-Il
Oh, Kyoungsu
[J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
[38] Remote Sensing Data Binary Classification Using Boosting with Simple Classifiers
Artur Nowakowski
[J]. Acta Geophysica, 2015, 63 : 1447 - 1462
[39] Boosting the classification performance of latent fingerprint segmentation using cascade of classifiers
Chhabra, Megha
Shukla, Manoj Kumar
Ravulakollu, Kiran Kumar
[J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2020, 14 (03): : 359 - 371
[40] Direct multiclass boosting using base classifiers' posterior probabilities estimates
Bourel, Mathias
Ghattas, Badih
[J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 228 - 233

← 1 2 3 4 5 →