Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

被引:1
|
作者
Chakraborty, Pritam [1 ]
Bandyopadhyay, Anjan [1 ]
Sahu, Preeti Padma [1 ]
Burman, Aniket [1 ]
Mallik, Saurav [2 ]
Alsubaie, Najah [3 ]
Abbas, Mohamed [4 ]
Alqahtani, Mohammed S. [5 ,6 ]
Soufiene, Ben Othman [7 ]
机构
[1] KIIT Univ, Sch Comp Engn, Bhubaneswar 751024, Odisha, India
[2] Harvard TH Chan Sch Publ Hlth, Dept Environm Hlth, 677 Huntington Ave, Boston, MA 02115 USA
[3] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 84428, Riyadh 11671, Saudi Arabia
[4] King Khalid Univ, Coll Engn, Elect Engn Dept, Abha 61421, Saudi Arabia
[5] King Khalid Univ, Coll Appl Med Sci, Radiol Sci Dept, Abha 61421, Saudi Arabia
[6] Univ Leicester, Space Res Ctr, BioImaging Unit, Michael Atiyah Bldg, Leicester LE1 7RH, England
[7] Univ Sousse, PRINCE Lab Res, ISITcom, Sousse, Tunisia
来源
BMC BIOINFORMATICS | 2024年 / 25卷 / 01期
关键词
Stroke prediction; Machine learning; Principal component analysis (PCA); Stacking ensemble; Healthcare analytics; Predictive modeling; Class imbalance; Feature selection; Early intervention;
D O I
10.1186/s12859-024-05866-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. We systematically varied PCA components and implemented a stacking model comprising random forest, decision tree, and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA components to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6% accuracy in stroke prediction. Evaluation metrics underscored the robustness of our approach in handling class imbalance and improving model performance, also comparative analyses against traditional machine learning algorithms such as SVM, logistic regression, and Naive Bayes highlighted the superiority of our proposed method.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Feature Selection and Machine Learning Model for Predicting Diabetic Kidney Disease Risk in Asians
    Sabanayagam, Charumathi
    He, Feng
    Nusinovici, Simon
    Lim, Cynthia C.
    Li, Jialiang
    Wong, Tien Yin
    Cheng, Ching-Yu
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (10): : 267 - 267
  • [42] Feature Selection-based Machine Learning Comparative Analysis for Predicting Breast Cancer
    Rajpoot, Chour Singh
    Sharma, Gajanand
    Gupta, Praveen
    Dadheech, Pankaj
    Yahya, Umar
    Aneja, Nagender
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [43] Machine learning for predicting concrete carbonation depth: A comparative analysis and a novel feature selection
    Ehsani, Mehrdad
    Ostovari, Mobin
    Mansouri, Shoaib
    Naseri, Hamed
    Jahanbakhsh, Hamid
    Nejad, Fereidoon Moghadas
    CONSTRUCTION AND BUILDING MATERIALS, 2024, 417
  • [44] Enhancing a machine learning model for predicting agricultural drought through feature selection techniques
    Nikdad, Pardis
    Ghaleni, Mehdi Mohammadi
    Moghaddasi, Mahnoosh
    Pradhan, Biswajeet
    APPLIED WATER SCIENCE, 2024, 14 (06)
  • [45] An ontology-based approach for preprocessing in machine learning
    Soto, Patricia Centeno
    Ramzy, Nour
    Ocker, Felix
    Vogel-Heuser, Birgit
    INES 2021: 2021 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, 2021,
  • [46] Towards Explaining the Effects of Data Preprocessing on Machine Learning
    Zelaya, Carlos Vladimiro Gonzalez
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2086 - 2090
  • [47] Machine Learning based Intelligent Framework for Data Preprocessing
    Sarwar, Sohail
    Qayyum, Zia Ul
    Kaleem, Abdul
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1010 - 1015
  • [48] Data Preprocessing and Machine Learning Modeling for Rockburst Assessment
    Li, Jie
    Fu, Helin
    Hu, Kaixun
    Chen, Wei
    SUSTAINABILITY, 2023, 15 (18)
  • [49] XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
    Kartashov, Oleg O.
    Chernov, Andrey V.
    Polyanichenko, Dmitry S.
    Butakova, Maria A.
    MATERIALS, 2021, 14 (24)
  • [50] Data preprocessing impact on machine learning algorithm performance
    Amato, Alberto
    Di Lecce, Vincenzo
    OPEN COMPUTER SCIENCE, 2023, 13 (01)