Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

被引:1
|
作者
Chakraborty, Pritam [1 ]
Bandyopadhyay, Anjan [1 ]
Sahu, Preeti Padma [1 ]
Burman, Aniket [1 ]
Mallik, Saurav [2 ]
Alsubaie, Najah [3 ]
Abbas, Mohamed [4 ]
Alqahtani, Mohammed S. [5 ,6 ]
Soufiene, Ben Othman [7 ]
机构
[1] KIIT Univ, Sch Comp Engn, Bhubaneswar 751024, Odisha, India
[2] Harvard TH Chan Sch Publ Hlth, Dept Environm Hlth, 677 Huntington Ave, Boston, MA 02115 USA
[3] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 84428, Riyadh 11671, Saudi Arabia
[4] King Khalid Univ, Coll Engn, Elect Engn Dept, Abha 61421, Saudi Arabia
[5] King Khalid Univ, Coll Appl Med Sci, Radiol Sci Dept, Abha 61421, Saudi Arabia
[6] Univ Leicester, Space Res Ctr, BioImaging Unit, Michael Atiyah Bldg, Leicester LE1 7RH, England
[7] Univ Sousse, PRINCE Lab Res, ISITcom, Sousse, Tunisia
来源
BMC BIOINFORMATICS | 2024年 / 25卷 / 01期
关键词
Stroke prediction; Machine learning; Principal component analysis (PCA); Stacking ensemble; Healthcare analytics; Predictive modeling; Class imbalance; Feature selection; Early intervention;
D O I
10.1186/s12859-024-05866-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. We systematically varied PCA components and implemented a stacking model comprising random forest, decision tree, and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA components to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6% accuracy in stroke prediction. Evaluation metrics underscored the robustness of our approach in handling class imbalance and improving model performance, also comparative analyses against traditional machine learning algorithms such as SVM, logistic regression, and Naive Bayes highlighted the superiority of our proposed method.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Feature Selection in Pulmonary Function Test Data with Machine Learning Methods
    Karakis, Rukiye
    Guler, Inan
    Isik, Ali Hakan
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [32] Data Driven Feature Selection for Machine Learning Algorithms in Computer Vision
    Zhang, Fan
    Li, Wei
    Zhang, Yifan
    Feng, Zhiyong
    IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (06): : 4262 - 4272
  • [33] Feature selection as a preprocessing step for classification in gene expression data
    Borges, Helyane Bronoski
    Nievola, Julio Cesar
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 157 - +
  • [34] Predicting the likelihood of readmission in patients with ischemic stroke: An explainable machine learning approach using common data model data
    Hwang, Yu Seong
    Kim, Seongheon
    Yim, Inhyeok
    Park, Yukyoung
    Kang, Seonguk
    Jo, Heui Sug
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 195
  • [35] UNSUPERVISED LEARNING APPROACH TO FEATURE SELECTION IN BIOLOGICAL DATA ANALYSIS
    Jacak, Witold
    Proell, Karin
    24TH EUROPEAN MODELING AND SIMULATION SYMPOSIUM (EMSS 2012), 2012, : 232 - 236
  • [36] Probabilistic Feature Selection in Machine Learning
    Ghosh, Indrajit
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 623 - 632
  • [37] A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning
    Buyukkececi, Mustafa
    Okur, Mehmet Cudi
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2023, 36 (04): : 1506 - 1520
  • [38] Machine learning feature importance selection for predicting aboveground biomass in African savannah with landsat 8 and ALOS PALSAR data
    Ibrahim, Sa 'ad
    Balzter, Heiko
    Tansey, Kevin
    MACHINE LEARNING WITH APPLICATIONS, 2024, 16
  • [39] Enhancing Phishing Detection: A Machine Learning Approach With Feature Selection and Deep Learning Models
    Nayak, Ganesh S.
    Muniyal, Balachandra
    Belavagi, Manjula C.
    IEEE ACCESS, 2025, 13 : 33308 - 33320
  • [40] Predicting Interference Graphs from Data: A Machine Learning Approach
    Gowgi, Prayag
    Sadasivan, Jishnu
    Teslenko, Maxim
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,