Stroke Treatment Prediction Using Features Selection Methods and Machine Learning Classifiers

被引:4
|
作者
Chourib, I. [1 ,2 ]
Guillard, G. [3 ]
Farah, I. R. [1 ]
Solaiman, B. [2 ]
机构
[1] Natl Sch Comp Sci, STICODE Dept, RIADI Lab, Manouba, Tunisia
[2] IMT Atlantique, MATHSTIC Dept, ITI Lab, Brest, France
[3] Intradys, Brest, France
关键词
Stroke disease; Feature selection; Data mining; Decision tree classifier; Naive Bayes; K-nearest neighbor; Recursive feature elimination; Tree-based model; Chi-square; CLASSIFICATION;
D O I
10.1016/j.irbm.2022.02.002
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Objectives: Feature selection in data sets is an important task allowing to alleviate various machine learning and data mining issues. The main objectives of a feature selection method consist on building simpler and more understandable classifier models in order to improve the data mining and processing performances. Therefore, a comparative evaluation of the Chi-square method, recursive feature elimination method, and tree-based method (using Random Forest) used on the three common machine learning methods (K-Nearest Neighbor, naive Bayesian classifier and decision tree classifier) are performed to select the most relevant primitives from a large set of attributes. Furthermore, determining the most suitable couple (i.e., feature selection method-machine learning method) that provides the best performance is performed.Materials and methods: In this paper, an overview of the most common feature selection techniques is first provided: the Chi-Square method, the Recursive Feature Elimination method (RFE) and the tree-based method (using Random Forest). A comparative evaluation of the improvement (brought by such feature selection methods) to the three common machine learning methods (K-Nearest Neighbor, naive Bayesian classifier and decision tree classifier) are performed. For evaluation purposes, the following measures: micro-F1, accuracy and root mean square error are used on the stroke disease data set.Results: The obtained results show that the proposed approach (i.e., Tree Based Method using Random Forest, TBM-RF, decision tree classifier, DTC) provides accuracy higher than 85%, F1-score higher than 88%, thus, better than the KNN and NB using the Chi-Square, RFE and TBM-RF methods.Conclusion: This study shows that the couple -Tree Based Method using Random Forest (TBM-RF) decision tree classifier successfully and efficiently contributes to find the most relevant features and to predict and classify patient suffering of stroke disease."(c) 2022 AGBM. Published by Elsevier Masson SAS. All rights reserved.
引用
收藏
页码:678 / 686
页数:9
相关论文
共 50 条
  • [1] A Study of Features Affecting on Stroke Prediction Using Machine Learning
    Songram, Panida
    Jareanpon, Chatklaw
    [J]. MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2019, 11909 : 216 - 225
  • [2] Prediction Of Diabetics Using Machine Learning Classifiers:A Review
    Baby, Steffy T.
    Karunakaran, V
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 530 - 537
  • [3] Effective Features Selection and Machine Learning Classifiers for Improved Wireless Intrusion Detection
    Abdulhammed, Razan
    Faezipour, Miad
    Abuzneid, Abdelshakour
    Alessa, Ali
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON NETWORKS, COMPUTERS AND COMMUNICATIONS (ISNCC 2018), 2018,
  • [4] Prediction of Shigellosis outcomes in Israel using machine learning classifiers
    Adamker, G.
    Holzer, T.
    Karakis, I.
    Amitay, M.
    Anis, E.
    Singer, S. R.
    Barnett-Itzhaki, Z.
    [J]. EPIDEMIOLOGY AND INFECTION, 2018, 146 (11): : 1445 - 1451
  • [5] Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
    Hasan, Md. Kamrul
    Alam, Md. Ashraful
    Das, Dola
    Hossain, Eklas
    Hasan, Mahmudul
    [J]. IEEE ACCESS, 2020, 8 : 76516 - 76531
  • [6] CARDIAC DISEASE PREDICTION USING SMOTE AND MACHINE LEARNING CLASSIFIERS
    Priyadarshinee, Sudipta
    Panda, Madhumita
    [J]. JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 856 - 862
  • [7] Risk prediction of cardiovascular disease using machine learning classifiers
    Pal, Madhumita
    Parija, Smita
    Panda, Ganapati
    Dhama, Kuldeep
    Mohapatra, Ranjan K.
    [J]. OPEN MEDICINE, 2022, 17 (01): : 1100 - 1113
  • [8] Ensemble Feature Selection Framework for Paddy Yield Prediction in Cauvery Basin using Machine Learning Classifiers
    Sathya, P.
    Gnanasekaran, P.
    [J]. COGENT ENGINEERING, 2023, 10 (02):
  • [9] Early Stroke Prediction Using Machine Learning
    Sharma, Chetan
    Sharma, Shamneesh
    Kumar, Mukesh
    Sodhi, Ankur
    [J]. 2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 890 - 894
  • [10] Prediction of Recurrent Ischemic Stroke Using Registry Data and Machine Learning Methods: The Erlangen Stroke Registry
    Vodencarevic, Asmir
    Weingaertner, Michael
    Caro, J. Jaime
    Ukalovic, Dubravka
    Zimmermann-Rittereiser, Marcus
    Schwab, Stefan
    Kolominsky-Rabas, Peter
    [J]. STROKE, 2022, 53 (07) : 2299 - 2306