OCtS: an alternative of the t-Score method sensitive to outliers and correlation in feature selection

被引:0
|
作者
Demirarslan, Mert [1 ]
Suner, Asli [1 ]
机构
[1] Ege Univ, Fac Med, Dept Biostat & Med Informat, Izmir, Turkey
关键词
Data preprocessing; Missing value; Class noise; Class imbalance; Feature selection; Ensemble learning; CLASSIFICATION;
D O I
10.1080/03610918.2022.2046087
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A wide range of issues including missing values, class noise, class imbalance, outliers, correlation and irrelevant variables have the potential to negatively affect the overall performance of disease diagnosis classification algorithms. This study proposes a new technique, alternative to the t-Score method, to increase the performance of ensemble learning classification algorithms by removing irrelevant variables. Therefore, three publicly available datasets from medical domain varying in their sample sizes, number of variables, and data preprocessing problems were selected and processed with our newly proposed feature selection method called Outliers and Correlation t-Score (OCtS). Afterwards, six widely used ensemble learning algorithms including Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting Machine, Light Gradient Boosting Machine, CatBoost, and Bagging were employed for disease diagnosis classification, and performance metrics were measured. Our results indicate that the classification performance of six ensemble learning algorithms significantly increased when the OCtS method was employed, and our feature selection method, OCtS, exhibited higher performance compared to the standard t-score method across all datasets (p = 0.0001). We conclude that, using data preprocessing methods with OCtS offers better algorithm performance when employing ensemble learning algorithms in disease diagnosis classification.
引用
收藏
页码:1409 / 1422
页数:14
相关论文
共 50 条
  • [1] Gene and sample selection using T-score with sample selection
    Mundra, Piyushkumar A.
    Rajapakse, Jagath C.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 59 : 31 - 41
  • [2] An Adaptive Regression Feature Selection Method for Datasets with Outliers
    Guo, Yaqing
    Wang, Wenjian
    Su, Meihong
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (08): : 1695 - 1707
  • [3] Unsupervised Feature Selection Using Correlation Score
    Pattanshetti, Tanuja
    Attar, Vahida
    [J]. COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 355 - 362
  • [4] Correlation of Hounsfield Units with Bone Mineral Density and T-Score in Chinese Adults
    Wang, Xiaowen
    Zhao, Wenhua
    Chen, Xingda
    Zhang, Peng
    Zhou, Zelin
    Yan, Xianwei
    Song, Zefeng
    Lin, Shaohao
    Chen, Wanyan
    Shang, Qi
    Chen, Honglin
    Liang, De
    Shen, Gengyang
    Ren, Hui
    Jiang, Xiaobing
    [J]. WORLD NEUROSURGERY, 2024, 183 : E261 - E267
  • [5] THE METHOD OF CALCULATING THE PROBABLE VALUE OF T-SCORE IN PATIENTS WITH MULTIFOCAL ATHEROCALCIFICATION
    Kokov, A. N.
    Masenko, V.
    Semenov, S.
    Barbarash, O.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2017, 76 : 693 - 694
  • [6] Correlation based feature selection method
    Michalak, K.
    Kwasnicka, H.
    [J]. INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2010, 2 (05) : 319 - 332
  • [7] A Feature Selection Method Based on Feature Correlation Networks
    Savic, Milos
    Kurbalija, Vladimir
    Ivanovic, Mirjana
    Bosnic, Zoran
    [J]. MODEL AND DATA ENGINEERING (MEDI 2017), 2017, 10563 : 248 - 261
  • [8] The absorptiometry T-score:: influence of selection of the reference population and related considerations for everyday practice
    Levasseur, R
    Guaydier-Souquières, G
    Marcelli, C
    Sabatier, JP
    [J]. JOINT BONE SPINE, 2003, 70 (04) : 290 - 293
  • [9] Feature Selection Algorithm Based on Sparse Score and Correlation Analysis
    Xue, Shanliang
    Cheng, Sijia
    Li, Mengying
    Yuan, Yong
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 744 - 751
  • [10] A new feature selection method on classification of medical datasets: Kernel F-score feature selection
    Polat, Kemal
    Gunes, Salih
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10367 - 10373