Large-scale attribute selection using wrappers

被引:166
|
作者
Guetlein, Martin [1 ]
Frank, Eibe [2 ]
Hall, Mark [3 ]
Karwath, Andreas [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Dept Comp Sci, D-7800 Freiburg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Pentaho Corp, Orlando, FL USA
关键词
PREDICTION; CANCER;
D O I
10.1109/CIDM.2009.4938668
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed "optimal" subset size. We show that this technique reduces subset size while maintaining comparable accuracy.
引用
收藏
页码:332 / 339
页数:8
相关论文
共 50 条
  • [31] A Selection Module for Large-Scale Face Recognition Systems
    Grossi, Giuliano
    Lanzarotti, Raffaella
    Lin, Jianyi
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2015, PT II, 2015, 9280 : 529 - 539
  • [32] SELECTION OF LARGE-SCALE FUNCTIONAL FILM COATING SYSTEMS
    POLEY, N
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY, 1977, 14 (01): : 630 - 632
  • [33] Embedding Feature Selection for Large-scale Hierarchical Classification
    Naik, Azad
    Rangwala, Huzefa
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1212 - 1221
  • [34] Automatic Index Selection for Large-Scale Datalog Computation
    Subotic, Pavle
    Jordan, Herbert
    Chang, Lijun
    Fekete, Alan
    Scholz, Bernhard
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 12 (02): : 141 - 153
  • [35] Antenna Selection in Large-Scale Multiple Antenna Systems
    Zhao, Zhongyuan
    Peng, Mugen
    Wang, Li
    Cai, Wenqi
    Li, Yong
    Chen, Hsiao-Hwa
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, 2015, 9204 : 756 - 766
  • [36] Large-scale habitat selection by parrots in New Caledonia
    Legault, Andrew
    Chartendrault, Vivien
    Theuerkauf, Joern
    Rouys, Sophie
    Barre, Nicolas
    JOURNAL OF ORNITHOLOGY, 2011, 152 (02) : 409 - 419
  • [37] Towards large-scale geometry indexing by feature selection
    Tolias, Giorgos
    Kalantidis, Yannis
    Avrithis, Yannis
    Kollias, Stefanos
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 120 : 31 - 45
  • [38] Large-scale habitat selection by parrots in New Caledonia
    Andrew Legault
    Vivien Chartendrault
    Jörn Theuerkauf
    Sophie Rouys
    Nicolas Barré
    Journal of Ornithology, 2011, 152 : 409 - 419
  • [39] LARGE-SCALE SELECTION OF NATURAL T REGULATORY CELLS
    Del Papa, B.
    Di Ianni, M.
    Cecchini, D.
    Bonifacio, E.
    Zei, T.
    Iacucci, R.
    Bazzucchi, M.
    Moretti, L.
    Falzetti, F.
    Martelli, M. F.
    Tabilio, A.
    HAEMATOLOGICA, 2008, 93 : S128 - S128
  • [40] Using Propensity Score Weighting to Reduce Selection Bias in Large-Scale Data Sets
    Bishop, Crystal D.
    Leite, Walter L.
    Snyder, Patricia A.
    JOURNAL OF EARLY INTERVENTION, 2018, 40 (04) : 347 - 362