Comparative analysis of feature selection techniques for COVID-19 dataset

被引:2
|
作者
Mohtasham, Farideh [1 ]
Pourhoseingholi, MohamadAmin [2 ]
Nazari, Seyed Saeed Hashemi [3 ]
Kavousi, Kaveh [4 ]
Zali, Mohammad Reza [1 ]
机构
[1] Shahid Beheshti Univ Med Sci, Res Inst Gastroenterol & Liver Dis, Gastroenterol & Liver Dis Res Ctr, Tehran, Iran
[2] Univ Nottingham, Natl Inst Hlth & Care Res NIHR Nottingham Biomed R, Hearing Sci Mental Hlth & Clin Neurosci, Sch Med, Nottingham, England
[3] Shahid Beheshti Univ Med Sci SBMU, Dept Epidemiol, Sch Publ Hlth & Safety, Tehran, Iran
[4] Univ Tehran, Inst Biochem & Biophys IBB, Dept Bioinformat, Lab Complex Biol Syst & Bioinformat CBB, Tehran, Iran
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
MODELS;
D O I
10.1038/s41598-024-69209-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] COVID-19: A scholarly production dataset report for research analysis
    Santos, Breno Santana
    Silva, Ivanovitch
    Ribeiro-Dantas, Marcel da Camara
    Alves, Gisliany
    Endo, Patricia Takako
    Lima, Luciana
    DATA IN BRIEF, 2020, 32
  • [22] Clustering analysis of countries using the COVID-19 cases dataset
    Zarikas, Vasilios
    Poulopoulos, Stavros G.
    Gareiou, Zoe
    Zervas, Efthimios
    DATA IN BRIEF, 2020, 31
  • [23] Sentiment Analysis of IMDb Movie Reviews: A Comparative Analysis of Feature Selection and Feature Extraction Techniques
    Karak, Gahina
    Mishra, Shubham
    Bandyopadhyay, Arkadyuti
    Rohith, Pavirala Ranga Sai
    Rathore, Hemant
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 283 - 294
  • [24] COVID-19++: A Citation-Aware Covid-19 Dataset for the Analysis of Research Dynamics
    Galke, Lukas
    Seidlmayer, Eva
    Luedemann, Gavin
    Langnickel, Lisa
    Melnychuk, Tetyana
    Foerstner, Konrad U.
    Tochtermann, Klaus
    Schultz, Carsten
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4350 - 4355
  • [25] COVID-19 vaccination policy dataset
    Katie Attwell
    Nature Human Behaviour, 2023, 7 : 1247 - 1248
  • [26] COVID-19 behavior determinants dataset
    Song, Jianmeng
    Kim, Julia
    Graff-Guerrero, Ariel
    Quilty, Lena
    Sanches, Marcos
    Wells, Samantha
    Brown, Eric E.
    Agic, Branka
    Pollock, Bruce G.
    Gerretsen, Philip
    DATA IN BRIEF, 2022, 45
  • [27] Identification of Novel COVID-19 Biomarkers by Multiple Feature Selection Strategies
    Zhang, Shuai
    Qu, Renliang
    Wang, Pengyan
    Wang, Shenghan
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [28] COVID-19 vaccination policy dataset
    Attwell, Katie
    NATURE HUMAN BEHAVIOUR, 2023, 7 (08) : 1247 - 1248
  • [29] Pregnancy Outcome in COVID-19 Suspected and COVID-19 Confirmed Women: A Comparative Analysis
    Chaudhary, Saima
    Nazir, Sarwat
    Shahbaz, Fatima
    Humayun, Sara
    Akhtar, Naheed
    Humayun, Shamsa
    ANNALS OF KING EDWARD MEDICAL UNIVERSITY LAHORE PAKISTAN, 2021, 27 (02): : 255 - 261
  • [30] A Comparative Analysis of Swarm Intelligence Techniques for Feature Selection in Cancer Classification
    Gunavathi, Chellamuthu
    Premalatha, Kandasamy
    SCIENTIFIC WORLD JOURNAL, 2014,