Benchmark for filter methods for feature selection in high-dimensional classification data

被引:337
|
作者
Bommert, Andrea [1 ]
Sun, Xudong [2 ]
Bischl, Bernd [2 ]
Rahnenfuehrer, Joerg [1 ]
Lang, Michel [1 ]
机构
[1] TU Dortmund Univ, Dept Stat, D-44221 Dortmund, Germany
[2] Ludwig Maximilians Univ Munchen, Dept Stat, Ludwigstr 33, D-80539 Munich, Germany
关键词
Feature selection; Filter methods; High-dimensional data; Benchmark; INFORMATION; ALGORITHMS; RELEVANCE; MODEL;
D O I
10.1016/j.csda.2019.106839
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature selection is one of the most fundamental problems in machine learning and has drawn increasing attention due to high-dimensional data sets emerging from different fields like bioinformatics. For feature selection, filter methods play an important role, since they can be combined with any machine learning model and can heavily reduce run time of machine learning algorithms. The aim of the analyses is to review how different filter methods work, to compare their performance with respect to both run time and predictive accuracy, and to provide guidance for applications. Based on 16 high-dimensional classification data sets, 22 filter methods are analyzed with respect to run time and accuracy when combined with a classification method. It is concluded that there is no group of filter methods that always outperforms all other methods, but recommendations on filter methods that perform well on many of the data sets are made. Also, groups of filters that are similar with respect to the order in which they rank the features are found. For the analyses, the R machine learning package mlr is used. It provides a uniform programming API and therefore is a convenient tool to conduct feature selection using filter methods. (C) 2019 The Authors. Published by Elsevier B.V.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Benchmark of filter methods for feature selection in high-dimensional gene expression survival data
    Bommert, Andrea
    Welchowski, Thomas
    Schmid, Matthias
    Rahnenfuehrer, Joerg
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [2] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [3] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [4] On the scalability of feature selection methods on high-dimensional data
    V. Bolón-Canedo
    D. Rego-Fernández
    D. Peteiro-Barral
    A. Alonso-Betanzos
    B. Guijarro-Berdiñas
    N. Sánchez-Maroño
    [J]. Knowledge and Information Systems, 2018, 56 : 395 - 442
  • [5] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [6] Filter Feature Selection Performance Comparison in High-dimensional Data
    Huertas, Carlos
    Juarez-Ramirez, Reyes
    [J]. 2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [7] Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data
    Deng, Wan-Yu
    Liu, Dan
    Dong, Ying-Ying
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [8] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [9] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [10] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Bing Xue
    Mengjie Zhang
    [J]. Memetic Computing, 2016, 8 : 3 - 15