Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers

被引:12
|
作者
Rajendran, Rajkamal [1 ]
Karthi, Anitha [2 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Dept Comp Technol, Chennai 603203, India
[2] Bharath Inst Higher Educ & Res, Sch Comp, Dept Comp Sci & Engn, Chennai 600073, India
关键词
Machine Learning; Heart disease prediction; Ensemble model; Entropy based feature engineering; Imputing Missing Values; Outlier Removal; FEATURE-SELECTION; FRAMEWORK;
D O I
10.1016/j.eswa.2022.117882
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Learning (ML) in the healthcare industry has newly made headlines. Several ML models developed in the state of the art frameworks with different databases. However, improvements are still required in terms of performance to bring the robustness of the ML models in accurate prediction of heart diseases. The main impetus of the work is to propose a new ML pipeline for accurate prediction of heart disease. It includes pre-processing and entropy based feature engineering (FE) approach to produce high quality features to provide better model performance. The heart disease dataset is curated by combining Cleveland, V A medical center, Hungarian and Switzerland databases over 14 common attributes. Imputing missing values (IMV) and Outliers are removed (OR) based on the relation exist between healthcare attributes and Mahalanobis distance respectively in the curated heart disease dataset (HDD). Experimental results revealed that the IMV + OR pre-processing dominates with better performance than other pre-processing methods applied for model evaluation. Analyses were carried out with different ML models where HDD is subjected to IMV + OR processing with Independent Component Analysis (ICA), Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and entropy based FE (proposed). Employing proposed entropy based FE with IMV + OR pre-processing has shown remarkable improvement in respect of all metrics for NB and LR classifiers. Further, experimental results shown that the ensemble model (LR + NB) performed well under proposed pipeline, with AUC (96.8%), Accuracy (92.7%), Specificity (91.5%), Precision (92.5%) and F1 Score (0.931) which outperformed the state of the art results.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multi Disease Prediction Using Ensembling of Distinct Machine Learning and Deep Learning Classifiers
    Datta, M. Chaitanya
    Chowdary, B. Venkaiah
    Senapati, Rajiv
    [J]. SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, PT 2, ICSOFTCOMP 2023, 2024, 2031 : 245 - 257
  • [2] Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
    Hasan, Md. Kamrul
    Alam, Md. Ashraful
    Das, Dola
    Hossain, Eklas
    Hasan, Mahmudul
    [J]. IEEE ACCESS, 2020, 8 : 76516 - 76531
  • [3] Effective Feature Engineering Technique for Heart Disease Prediction With Machine Learning
    Qadri, Azam Mehmood
    Raza, Ali
    Munir, Kashif
    Almutairi, Mubarak S.
    [J]. IEEE ACCESS, 2023, 11 : 56214 - 56224
  • [4] Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators
    Reddy, Karna Vishnu Vardhana
    Elamvazuthi, Irraivan
    Abd Aziz, Azrina
    Paramasivam, Sivajothi
    Chua, Hui Na
    Pranavanand, S.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [5] Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation
    Ahamed, B. Shamreen
    Arya, Meenakshi S.
    Nancy, Auxilia Osvin V.
    [J]. ADVANCES IN HUMAN-COMPUTER INTERACTION, 2022, 2022
  • [6] Diabetes prediction using machine learning classifiers with oversampling and feature augmentation
    Banday, Mehroush
    Zafar, Sherin
    Agarwal, Parul
    Alam, M. Afshar
    [J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2024, 27 (02) : 455 - 464
  • [7] Ensembling classifiers using unsupervised learning
    Bundzel, Marek
    Sincak, Peter
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2008, PROCEEDINGS, 2008, 5097 : 513 - 521
  • [8] CARDIAC DISEASE PREDICTION USING SMOTE AND MACHINE LEARNING CLASSIFIERS
    Priyadarshinee, Sudipta
    Panda, Madhumita
    [J]. JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 856 - 862
  • [9] Risk prediction of cardiovascular disease using machine learning classifiers
    Pal, Madhumita
    Parija, Smita
    Panda, Ganapati
    Dhama, Kuldeep
    Mohapatra, Ranjan K.
    [J]. OPEN MEDICINE, 2022, 17 (01): : 1100 - 1113
  • [10] Prediction of Heart Disease Using Machine Learning
    Begum, M. Asma
    Abirami, S.
    Anandhi, R.
    Dhivyadharshini, K.
    Devi, R. Ganga
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (04): : 39 - 42