Comparative analysis of feature selection techniques for COVID-19 dataset

被引:2
|
作者
Mohtasham, Farideh [1 ]
Pourhoseingholi, MohamadAmin [2 ]
Nazari, Seyed Saeed Hashemi [3 ]
Kavousi, Kaveh [4 ]
Zali, Mohammad Reza [1 ]
机构
[1] Shahid Beheshti Univ Med Sci, Res Inst Gastroenterol & Liver Dis, Gastroenterol & Liver Dis Res Ctr, Tehran, Iran
[2] Univ Nottingham, Natl Inst Hlth & Care Res NIHR Nottingham Biomed R, Hearing Sci Mental Hlth & Clin Neurosci, Sch Med, Nottingham, England
[3] Shahid Beheshti Univ Med Sci SBMU, Dept Epidemiol, Sch Publ Hlth & Safety, Tehran, Iran
[4] Univ Tehran, Inst Biochem & Biophys IBB, Dept Bioinformat, Lab Complex Biol Syst & Bioinformat CBB, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
MODELS;
D O I
10.1038/s41598-024-69209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
引用
收藏
页数:20
相关论文
共 50 条
  • [11] Genetic Algorithms for Feature Selection in the Classification of COVID-19 Patients
    Aliani, Cosimo
    Rossi, Eva
    Solinski, Mateusz
    Francia, Piergiorgio
    Lanata, Antonio
    Buchner, Teodor
    Bocchi, Leonardo
    BIOENGINEERING-BASEL, 2024, 11 (09):
  • [12] Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques
    Hayet-Otero, Miren
    Garcia-Garcia, Fernando
    Lee, Dae-Jin
    Martinez-Minaya, Joaquin
    Yandiola, Pedro Pablo Espana
    Landa, Isabel Urrutia
    Ermecheo, Monica Nieves
    Quintana, Jose Maria
    Menendez, Rosario
    Torres, Antoni
    Jorge, Rafael Zalacain
    Arostegui, Inmaculada
    PLOS ONE, 2023, 18 (04):
  • [13] Effects of dataset characteristics on the performance of feature selection techniques
    Oreski, Dijana
    Oreski, Stjepan
    Klicek, Bozidar
    APPLIED SOFT COMPUTING, 2017, 52 : 109 - 119
  • [14] Predicting Thalassemia Using Feature Selection Techniques: A Comparative Analysis
    Saleem, Muniba
    Aslam, Waqar
    Lali, Muhammad Ikram Ullah
    Rauf, Hafiz Tayyab
    Nasr, Emad Abouel
    DIAGNOSTICS, 2023, 13 (22)
  • [15] Investigating boosting techniques' efficacy in feature selection: A comparative analysis
    Ahmed, Ubaid
    Mahmood, Anzar
    Tunio, Majid Ali
    Hafeez, Ghulam
    Khan, Ahsan Raza
    Razzaq, Sohail
    ENERGY REPORTS, 2024, 11 : 3521 - 3532
  • [16] Prediction of Covid-19 and post Covid-19 patients with reduced feature extraction using Machine Learning Techniques
    Bano, Shehr
    Hussain, Syed Fawad
    2021 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2021), 2021, : 37 - 42
  • [17] Analysis of Various COVID-19 Prediction Techniques
    Arora R.K.
    Gupta M.K.
    Bhati B.S.
    Gupta, Manoj Kumar (manojkgupta5@gmail.com), 1600, Institute of Electronics Engineers of Korea (10): : 323 - 329
  • [18] Community detection using unsupervised machine learning techniques on COVID-19 dataset
    Laxmi Chaudhary
    Buddha Singh
    Social Network Analysis and Mining, 2021, 11
  • [19] Community detection using unsupervised machine learning techniques on COVID-19 dataset
    Chaudhary, Laxmi
    Singh, Buddha
    SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [20] A Comprehensive Data Imbalance Analysis for Covid-19 Classification Dataset
    Tissir, Zineb
    Poudel, Sahadev
    Baidya, Ranjai
    Lee, Sang-Woong
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 20 - 24