Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing

被引：1

作者：

Chakraborty, Pritam ^{[1
]}

Bandyopadhyay, Anjan ^{[1
]}

Sahu, Preeti Padma ^{[1
]}

Burman, Aniket ^{[1
]}

Mallik, Saurav ^{[2
]}

Alsubaie, Najah ^{[3
]}

Abbas, Mohamed ^{[4
]}

Alqahtani, Mohammed S. ^{[5
,6
]}

Soufiene, Ben Othman ^{[7
]}

机构：

[1] KIIT Univ, Sch Comp Engn, Bhubaneswar 751024, Odisha, India

[2] Harvard TH Chan Sch Publ Hlth, Dept Environm Hlth, 677 Huntington Ave, Boston, MA 02115 USA

[3] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 84428, Riyadh 11671, Saudi Arabia

[4] King Khalid Univ, Coll Engn, Elect Engn Dept, Abha 61421, Saudi Arabia

[5] King Khalid Univ, Coll Appl Med Sci, Radiol Sci Dept, Abha 61421, Saudi Arabia

[6] Univ Leicester, Space Res Ctr, BioImaging Unit, Michael Atiyah Bldg, Leicester LE1 7RH, England

[7] Univ Sousse, PRINCE Lab Res, ISITcom, Sousse, Tunisia

来源：

BMC BIOINFORMATICS | 2024年 / 25卷 / 01期

关键词：

Stroke prediction; Machine learning; Principal component analysis (PCA); Stacking ensemble; Healthcare analytics; Predictive modeling; Class imbalance; Feature selection; Early intervention;

D O I：

10.1186/s12859-024-05866-8

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Stroke prediction remains a critical area of research in healthcare, aiming to enhance early intervention and patient care strategies. This study investigates the efficacy of machine learning techniques, particularly principal component analysis (PCA) and a stacking ensemble method, for predicting stroke occurrences based on demographic, clinical, and lifestyle factors. We systematically varied PCA components and implemented a stacking model comprising random forest, decision tree, and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA components to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6% accuracy in stroke prediction. Evaluation metrics underscored the robustness of our approach in handling class imbalance and improving model performance, also comparative analyses against traditional machine learning algorithms such as SVM, logistic regression, and Naive Bayes highlighted the superiority of our proposed method.

引用

页数：23

共 50 条

[41] Feature Selection and Machine Learning Model for Predicting Diabetic Kidney Disease Risk in Asians
Sabanayagam, Charumathi
He, Feng
Nusinovici, Simon
Lim, Cynthia C.
Li, Jialiang
Wong, Tien Yin
Cheng, Ching-Yu
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2021, 32 (10): : 267 - 267
[42] Feature Selection-based Machine Learning Comparative Analysis for Predicting Breast Cancer
Rajpoot, Chour Singh
Sharma, Gajanand
Gupta, Praveen
Dadheech, Pankaj
Yahya, Umar
Aneja, Nagender
APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
[43] Machine learning for predicting concrete carbonation depth: A comparative analysis and a novel feature selection
Ehsani, Mehrdad
Ostovari, Mobin
Mansouri, Shoaib
Naseri, Hamed
Jahanbakhsh, Hamid
Nejad, Fereidoon Moghadas
CONSTRUCTION AND BUILDING MATERIALS, 2024, 417
[44] Enhancing a machine learning model for predicting agricultural drought through feature selection techniques
Nikdad, Pardis
Ghaleni, Mehdi Mohammadi
Moghaddasi, Mahnoosh
Pradhan, Biswajeet
APPLIED WATER SCIENCE, 2024, 14 (06)
[45] An ontology-based approach for preprocessing in machine learning
Soto, Patricia Centeno
Ramzy, Nour
Ocker, Felix
Vogel-Heuser, Birgit
INES 2021: 2021 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, 2021,
[46] Towards Explaining the Effects of Data Preprocessing on Machine Learning
Zelaya, Carlos Vladimiro Gonzalez
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2086 - 2090
[47] Machine Learning based Intelligent Framework for Data Preprocessing
Sarwar, Sohail
Qayyum, Zia Ul
Kaleem, Abdul
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1010 - 1015
[48] Data Preprocessing and Machine Learning Modeling for Rockburst Assessment
Li, Jie
Fu, Helin
Hu, Kaixun
Chen, Wei
SUSTAINABILITY, 2023, 15 (18)
[49] XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
Kartashov, Oleg O.
Chernov, Andrey V.
Polyanichenko, Dmitry S.
Butakova, Maria A.
MATERIALS, 2021, 14 (24)
[50] Data preprocessing impact on machine learning algorithm performance
Amato, Alberto
Di Lecce, Vincenzo
OPEN COMPUTER SCIENCE, 2023, 13 (01)

← 1 2 3 4 5 →