Threshold-based feature selection techniques for high-dimensional bioinformatics data

被引:42
|
作者
Van Hulse J. [1 ]
Khoshgoftaar T.M. [1 ]
Napolitano A. [1 ]
Wald R. [1 ]
机构
[1] Data Mining and Machine Learning Laboratory, Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
关键词
Bioinformatics; Correlation matrix; Feature selection; Frobenius norm; Kendall's Tau rank correlation; Threshold-based feature selection;
D O I
10.1007/s13721-012-0006-6
中图分类号
学科分类号
摘要
Analysis conducted for bioinformatics applications often requires the use of feature selection methodologies to handle datasets with very high dimensionality. We propose 11 new threshold-based feature selection techniques and compare the performance of these new techniques to that of six standard filter-based feature selection procedures. Unlike other comparisons of feature selection techniques, we directly compare the feature rankings produced by each technique using Kendall's Tau rank correlation, showing that the newly proposed techniques exhibit substantially different behaviors than the standard filter-based feature selection methods. Our experiments consider 17 different bioinformatics datasets, and the similarities of the feature selection techniques are analyzed using the Frobenius norm. The feature selection techniques are also compared by using Naive Bayes and Support Vector Machine algorithms to learn from the training datasets. The experimental results show that the new procedures perform very well compared to the standard filters, and hence are useful feature selection methodologies for the analysis of bioinformatics data. © 2012 Springer-Verlag.
引用
收藏
页码:47 / 61
页数:14
相关论文
共 50 条
  • [21] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    [J]. BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [22] High-Dimensional Software Engineering Data and Feature Selection
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    [J]. ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 83 - +
  • [23] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [24] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145
  • [25] Hybrid Feature Selection for High-Dimensional Manufacturing Data
    Sun, Yajuan
    Yu, Jianlin
    Li, Xiang
    Wu, Ji Yan
    Lu, Wen Feng
    [J]. 2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [26] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    [J]. 2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175
  • [27] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [28] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [29] Evaluating Feature Selection Robustness on High-Dimensional Data
    Pes, Barbara
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS (HAIS 2018), 2018, 10870 : 235 - 247
  • [30] Feature selection for classifying high-dimensional numerical data
    Wu, YM
    Zhang, AD
    [J]. PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, 2004, : 251 - 258