Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

被引:0
|
作者
Seba, P. Antony [1 ]
Benifa, J. V. Bibal [2 ]
机构
[1] Indian Inst Informat Technol Kottayam, Dept Comp Sci & Engn, Kottayam, Kerala, India
[2] Informat Technol Kottayam, Dept Comp Sci & Engn, Kottayam, Kerala, India
关键词
data contemplation; DEA; feature selection; TOPSIS; CLASSIFIER;
D O I
10.4218/etrij.2022-0018
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.
引用
收藏
页码:448 / 461
页数:14
相关论文
共 50 条
  • [1] Data analytics and clinical feature ranking of medical records of patients with sepsis
    Chicco, Davide
    Oneto, Luca
    BIODATA MINING, 2021, 14 (01)
  • [2] Data analytics and clinical feature ranking of medical records of patients with sepsis
    Davide Chicco
    Luca Oneto
    BioData Mining, 14
  • [3] GMDH-based feature ranking and selection for improved classification of medical data
    Abdel-Aal, RE
    JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) : 456 - 468
  • [4] Feature selection considering the composition of feature relevancy
    Gao, Wanfu
    Hu, Liang
    Zhang, Ping
    He, Jialong
    PATTERN RECOGNITION LETTERS, 2018, 112 : 70 - 74
  • [5] Challenges of Feature Selection for Big Data Analytics
    Li J.
    Liu H.
    1600, Institute of Electrical and Electronics Engineers Inc., United States (32): : 9 - 15
  • [6] Feature Selection Techniques for Big Data Analytics
    Albattah, Waleed
    Khan, Rehan Ullah
    Alsharekh, Mohammed F.
    Khasawneh, Samer F.
    ELECTRONICS, 2022, 11 (19)
  • [7] Challenges of Feature Selection for Big Data Analytics
    Li, Jundong
    Liu, Huan
    IEEE INTELLIGENT SYSTEMS, 2017, 32 (02) : 9 - 15
  • [8] Ensemble feature ranking applied to medical data
    Santos, Vitor
    Datia, Nuno
    Pato, M. P. M.
    CONFERENCE ON ELECTRONICS, TELECOMMUNICATIONS AND COMPUTERS - CETC 2013, 2014, 17 : 223 - 230
  • [9] Feature selection considering weighted relevancy
    Zhang, Ping
    Gao, Wanfu
    Liu, Guixia
    APPLIED INTELLIGENCE, 2018, 48 (12) : 4615 - 4625
  • [10] Feature selection considering weighted relevancy
    Ping Zhang
    Wanfu Gao
    Guixia Liu
    Applied Intelligence, 2018, 48 : 4615 - 4625