Diabetes Prediction Using Derived Features and Ensembling of Boosting Classifiers

被引:3
|
作者
Rajkamal, R. [1 ]
Karthi, Anitha [2 ]
Gao, Xiao-Zhi [3 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Chennai, Tamil Nadu, India
[2] Bharat Inst Higher Educ & Res, Sch Comp, Chennai, Tamil Nadu, India
[3] Univ Eastern Finland, Sch Comp, Kuopio, Finland
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 73卷 / 01期
关键词
Diabetes prediction; feature engineering; highly informative features; ML models; ensembling models; MISSING-DATA;
D O I
10.32604/cmc.2022.027142
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is increasing commonly in people???s daily life and represents an extraordinary threat to human well-being. Machine Learning (ML) in the healthcare industry has recently made headlines. Several ML models are developed around different datasets for diabetic prediction. It is essential for ML models to predict diabetes accurately. Highly informative features of the dataset are vital to determine the capability factors of the model in the prediction of diabetes. Feature engineering (FE) is the way of taking forward in yielding highly informative features. Pima Indian Diabetes Dataset (PIDD) is used in this work, and the impact of informative features in ML models is experimented with and analyzed for the prediction of diabetes. Missing values (MV) and the effect of the imputation process in the data distribution of each feature are analyzed. Permutation importance and partial dependence are carried out extensively and the results revealed that Glucose (GLUC), Body Mass Index (BMI), and Insulin (INS) are highly informative features. Derived features are obtained for BMI and INS to add more information with its raw form. The ensemble classifier with an ensemble of AdaBoost (AB) and XGBoost (XB) is considered for the impact analysis of the proposed FE approach. The ensemble model performs well for the inclusion of derived features provided the high Diagnostics Odds Ratio (DOR) of 117.694. This shows a high margin of 8.2% when compared with the ensemble model with no derived features (DOR = 96.306) included in the experiment. The inclusion of derived features with the FE approach of the current state-of-the-art made the ensemble model performs well with Sensitivity (0.793), Specificity (0.945), DOR (79.517), and False Omission Rate (0.090) which further improves the state-of-the-art results.
引用
收藏
页码:2013 / 2033
页数:21
相关论文
共 50 条
  • [1] Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
    Hasan, Md. Kamrul
    Alam, Md. Ashraful
    Das, Dola
    Hossain, Eklas
    Hasan, Mahmudul
    [J]. IEEE ACCESS, 2020, 8 : 76516 - 76531
  • [2] Ensembling classifiers using unsupervised learning
    Bundzel, Marek
    Sincak, Peter
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2008, PROCEEDINGS, 2008, 5097 : 513 - 521
  • [3] Multi Disease Prediction Using Ensembling of Distinct Machine Learning and Deep Learning Classifiers
    Datta, M. Chaitanya
    Chowdary, B. Venkaiah
    Senapati, Rajiv
    [J]. SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, PT 2, ICSOFTCOMP 2023, 2024, 2031 : 245 - 257
  • [4] Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers
    Rajendran, Rajkamal
    Karthi, Anitha
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [5] SMS Spam Detection using Selected Text Features and Boosting Classifiers
    Akbari, Fatemeh
    Sajedi, Hedieh
    [J]. 2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2015,
  • [6] Classifier Ensembling: Dataset Learning Using Bagging and Boosting
    Lomte, Santosh S.
    Torambekar, Sanket G.
    [J]. COMPUTING AND NETWORK SUSTAINABILITY, 2019, 75
  • [7] Enhancing Link Prediction Using Gradient Boosting Features
    Li, Taisong
    Wang, Jing
    Tu, Manshu
    Zhang, Yan
    Yan, Yonghong
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT II, 2016, 9772 : 81 - 92
  • [8] Boosting Classifiers Built from Different Subsets of Features
    Janodet, Jean-Christophe
    Sebban, Marc
    Suchier, Henri-Maxime
    [J]. FUNDAMENTA INFORMATICAE, 2009, 96 (1-2) : 89 - 109
  • [9] Boosting the performance of anomalous diffusion classifiers with the proper choice of features
    Kowalek, Patrycja
    Loch-Olszewska, Hanna
    Laszczuk, Lukasz
    Opala, Jaroslaw
    Szwabinski, Janusz
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2022, 55 (24)
  • [10] An ensemble learning approach for diabetes prediction using boosting techniques
    Ganie, Shahid Mohammad
    Pramanik, Pijush Kanti Dutta
    Malik, Majid Bashir
    Mallik, Saurav
    Qin, Hong
    [J]. FRONTIERS IN GENETICS, 2023, 14