Applications of Feature Selection Techniques on Large Biomedical Datasets

被引:0
|
作者
Ewen, Nicolas [1 ]
Abdou, Tamer [1 ,2 ]
Bener, Ayse [1 ]
机构
[1] Ryerson Univ, Data Sci Lab, Toronto, ON M5B 2K3, Canada
[2] Arish Univ, Fac Sci, North Sinai 45516, Egypt
来源
关键词
Feature selection; Bio-medical; Large dataset;
D O I
10.1007/978-3-030-18305-9_57
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main goal of this paper is to determine the best feature selection algorithm to use on large biomedical datasets. Feature Selection shows a potential role in analyzing large biomedical datasets. Four different feature selection techniques have been employed on large biomedical datasets. These techniques were Information Gain, Chi-Squared, Markov Blanket Discovery, and Recursive Feature Elimination. We measured the efficiency of the selection, the stability of the algorithms, and the quality of the features chosen. Of the four techniques used, the Information Gain and Chi-Squared filters were the most efficient and stable. Both Markov Blanket Discovery and Recursive Feature Elimination took significantly longer to select features, and were less stable. The features selected by Recursive Feature Elimination were of the highest quality, followed by Information Gain and Chi-Squared, and Markov Blanket Discovery placed last. For the purpose of education (e.g. those in the biomedical field teaching data techniques), we recommend Information Gain or Chi-Squared filter. For the purpose of research or analyzing, we recommend one of the filters or Recursive Feature Elimination, depending on the situation. We do not recommend the use of Markov Blanket discovery for the situations used in this trial, keeping in mind that the experiments were not exhaustive.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [11] Feature selection with limited datasets
    Kupinski, MA
    Giger, ML
    MEDICAL PHYSICS, 1999, 26 (10) : 2176 - 2182
  • [12] Feature selection techniques for microarray datasets: a comprehensive review, taxonomy, and future directions
    Balakrishnan, Kulanthaivel
    Dhanalakshmi, Ramasamy
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (10) : 1451 - 1478
  • [13] Performance Evaluation of Wrapper-Based Feature Selection Techniques for Medical Datasets
    Kewat, Anil
    Srivastava, P. N.
    Kumhar, Dharamdas
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 619 - 633
  • [14] One Class Genetic-Based Feature Selection for Classification in Large Datasets
    Alkubabji, Murad
    Aldasht, Mohammed
    Adi, Safa
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 301 - 311
  • [15] CCFS: A cooperating coevolution technique for large scale feature selection on microarray datasets
    Ebrahimpour, Mohammad K.
    Nezamabadi-Pour, Hossein
    Eftekhari, Mandi
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 73 : 171 - 178
  • [16] FEATURE SELECTION FOR DATASETS OF WINE FERMENTATIONS
    Mucherino, Antonio
    Urtubia, Alejandra
    10TH INTERNATIONAL CONFERENCE ON MODELING AND APPLIED SIMULATION, MAS 2011, 2011, : 309 - 313
  • [17] Feature selection techniques for maximum entropy based biomedical named entity recognition
    Saha, Sujan Kumar
    Sarkar, Sudeshna
    Mitra, Pabitra
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 905 - 911
  • [18] Comparative Analysis on the Stability of Feature Selection Techniques using Three Frameworks on Biological Datasets
    Wald, Randall
    Khoshgoftaar, Taghi
    Abu Shanab, Ahmad
    Napolitano, Amri
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 418 - 423
  • [19] Visualization techniques for large datasets
    Michalos, M.
    Tselenti, P.
    Nalmpantis, S.L.
    Journal of Engineering Science and Technology Review, 2012, 5 (01) : 72 - 76
  • [20] Scalable Global Mutual Information Based Feature Selection Framework for Large Scale Datasets
    Soheili, Majid
    Haeri, Maryam Amir
    2021 IEEE 25TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE (EDOC 2021), 2021, : 41 - 50