Diabetes Prediction Using Derived Features and Ensembling of Boosting Classifiers

被引:3
|
作者
Rajkamal, R. [1 ]
Karthi, Anitha [2 ]
Gao, Xiao-Zhi [3 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Chennai, Tamil Nadu, India
[2] Bharat Inst Higher Educ & Res, Sch Comp, Chennai, Tamil Nadu, India
[3] Univ Eastern Finland, Sch Comp, Kuopio, Finland
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 73卷 / 01期
关键词
Diabetes prediction; feature engineering; highly informative features; ML models; ensembling models; MISSING-DATA;
D O I
10.32604/cmc.2022.027142
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is increasing commonly in people???s daily life and represents an extraordinary threat to human well-being. Machine Learning (ML) in the healthcare industry has recently made headlines. Several ML models are developed around different datasets for diabetic prediction. It is essential for ML models to predict diabetes accurately. Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes. Feature engineering (FE) is the way of taking forward in yielding highly informative features. Pima Indian Diabetes Dataset (PIDD) is used in this work, and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes. Missing values (MV) and the effect of the imputation process in the data distribution of each feature are analyzed. Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose (GLUC), Body Mass Index (BMI), and Insulin (INS) are highly informative features. Derived features are obtained for BMI and INS to add more information with its raw form. The ensemble classifier with an ensemble of AdaBoost (AB) and XGBoost (XB) is considered for the impact analysis of the proposed FE approach. The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio (DOR) of 117.694. This shows a high margin of 8.2% when compared with the ensemble model with no derived features (DOR = 96.306) included in the experiment. The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity (0.793), Specificity (0.945), DOR (79.517), and False Omission Rate (0.090) which further improves the state-of-the-art results.
引用
收藏
页码:2013 / 2033
页数:21
相关论文
共 50 条
  • [31] Prediction and Prioritization of Rare Oncogenic Mutations in the Cancer Kinome Using Novel Features and Multiple Classifiers
    ManChon, U.
    Talevich, Eric
    Katiyar, Samiksha
    Rasheed, Khaled
    Kannan, Natarajan
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (04)
  • [32] Improving the Performance of Heart Disease Prediction System Using Ensembling Techniques
    Maini, Ekta
    Venkateswarlu, Bondu
    [J]. ADVANCED TRENDS IN MECHANICAL AND AEROSPACE ENGINEERING (ATMA-2019), 2021, 2316
  • [33] Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques
    Ahamed, B. Shamreen
    Arya, Meenakshi Sumeet
    Nancy, V. Auxilia Osvin
    [J]. FRONTIERS IN COMPUTER SCIENCE, 2022, 4
  • [34] An Intelligent Forecasting Model for Disease Prediction Using Stack Ensembling Approach
    Verma, Shobhit
    Sharma, Nonita
    Singh, Aman
    Alharbi, Abdullah
    Alosaimi, Wael
    Alyami, Hashem
    Gupta, Deepali
    Goyal, Nitin
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 6041 - 6055
  • [35] Boosting power for clinical trials using classifiers based on multiple biomarkers
    Kohannim, Omid
    Hua, Xue
    Hibar, Derrek P.
    Lee, Suh
    Chou, Yi-Yu
    Toga, Arthur W.
    Jack, Clifford R., Jr.
    Weiner, Michael W.
    Thompson, Paul M.
    [J]. NEUROBIOLOGY OF AGING, 2010, 31 (08) : 1429 - 1442
  • [36] Object detection using ensemble of linear classifiers with fuzzy adaptive boosting
    Kisang Kim
    Hyung-Il Choi
    Kyoungsu Oh
    [J]. EURASIP Journal on Image and Video Processing, 2017
  • [37] Object detection using ensemble of linear classifiers with fuzzy adaptive boosting
    Kim, Kisang
    Choi, Hyung-Il
    Oh, Kyoungsu
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
  • [38] Remote Sensing Data Binary Classification Using Boosting with Simple Classifiers
    Artur Nowakowski
    [J]. Acta Geophysica, 2015, 63 : 1447 - 1462
  • [39] Boosting the classification performance of latent fingerprint segmentation using cascade of classifiers
    Chhabra, Megha
    Shukla, Manoj Kumar
    Ravulakollu, Kiran Kumar
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2020, 14 (03): : 359 - 371
  • [40] Direct multiclass boosting using base classifiers' posterior probabilities estimates
    Bourel, Mathias
    Ghattas, Badih
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 228 - 233