Threshold-based feature selection techniques for high-dimensional bioinformatics data

被引:42
|
作者
Van Hulse J. [1 ]
Khoshgoftaar T.M. [1 ]
Napolitano A. [1 ]
Wald R. [1 ]
机构
[1] Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
关键词
Bioinformatics; Correlation matrix; Feature selection; Frobenius norm; Kendall's Tau rank correlation; Threshold-based feature selection;
D O I
10.1007/s13721-012-0006-6
中图分类号
学科分类号
摘要
Analysis conducted for bioinformatics applications often requires the use of feature selection methodologies to handle datasets with very high dimensionality. We propose 11 new threshold-based feature selection techniques and compare the performance of these new techniques to that of six standard filter-based feature selection procedures. Unlike other comparisons of feature selection techniques, we directly compare the feature rankings produced by each technique using Kendall's Tau rank correlation, showing that the newly proposed techniques exhibit substantially different behaviors than the standard filter-based feature selection methods. Our experiments consider 17 different bioinformatics datasets, and the similarities of the feature selection techniques are analyzed using the Frobenius norm. The feature selection techniques are also compared by using Naive Bayes and Support Vector Machine algorithms to learn from the training datasets. The experimental results show that the new procedures perform very well compared to the standard filters, and hence are useful feature selection methodologies for the analysis of bioinformatics data. © 2012 Springer-Verlag.
引用
收藏
页码:47 / 61
页数:14
相关论文
共 50 条
  • [1] Adaptive threshold-based classification of sparse high-dimensional data
    Pavlenko, Tatjana
    Stepanova, Natalia
    Thompson, Lee
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 1952 - 1996
  • [2] Measuring Stability of Threshold-based Feature Selection Techniques
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    [J]. 2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 986 - 993
  • [3] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [4] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [5] Feature selection based on geometric distance for high-dimensional data
    Lee, J. -H.
    Oh, S. -Y.
    [J]. ELECTRONICS LETTERS, 2016, 52 (06) : 473 - 474
  • [6] Scalable Feature Selection in High-Dimensional Data Based on GRASP
    Moshki, Mohsen
    Kabiri, Peyman
    Mohebalhojeh, Alireza
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2015, 29 (03) : 283 - 296
  • [7] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [8] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [9] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    [J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [10] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17