Large-scale attribute selection using wrappers

被引:166
|
作者
Guetlein, Martin [1 ]
Frank, Eibe [2 ]
Hall, Mark [3 ]
Karwath, Andreas [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Dept Comp Sci, D-7800 Freiburg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Pentaho Corp, Orlando, FL USA
关键词
PREDICTION; CANCER;
D O I
10.1109/CIDM.2009.4938668
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed "optimal" subset size. We show that this technique reduces subset size while maintaining comparable accuracy.
引用
收藏
页码:332 / 339
页数:8
相关论文
共 50 条
  • [21] MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
    Aghazadeh, Amirali
    Spring, Ryan
    LeJeune, Daniel
    Dasarathy, Gautam
    Shrivastava, Anshumali
    Baraniuk, Richard G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [22] Featre selection on large-scale issues using clustering and meta-algorithms
    Akhlaghian, Fardin
    Amiri, Shabnam
    AMAZONIA INVESTIGA, 2018, 7 (13): : 17 - 30
  • [23] Partition Selection for Large-Scale Data Management Using KNN Join Processing
    Hu, Yue
    Peng, Ge
    Wang, Zehua
    Cui, Yanrong
    Qin, Hang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [24] Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem
    Subramanian Appavu Alias Balamurugan
    Ramasamy Rajaram
    Machine Intelligence Research, 2009, 6 (01) : 62 - 71
  • [25] Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem
    Balamurugan, Subramanian Appavu Alias
    Rajaram, Ramasamy
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2009, 6 (01) : 62 - 71
  • [26] A Joint MLE Approach to Large-Scale Structured Latent Attribute Analysis
    Gu, Yuqi
    Xu, Gongjun
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 746 - 760
  • [27] Attribute annotation on large-scale image database by active knowledge transfer
    Jiang, Huajie
    Wang, Ruiping
    Li, Yan
    Liu, Haomiao
    Shan, Shiguang
    Chen, Xilin
    IMAGE AND VISION COMPUTING, 2018, 78 : 1 - 13
  • [28] Feature selection for large-scale data sets in GrC
    Liang, Jiye
    2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 2 - 7
  • [29] Adaptive Classifier Selection in Large-Scale Hierarchical Classification
    Partalas, Ioannis
    Babbar, Rohit
    Gaussier, Eric
    Amblard, Cecile
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 612 - 619
  • [30] LARGE-SCALE SELECTION SYNCHRONY OF TETRAHYMENA-THERMOPHILA
    HILL, RJ
    KROFT, T
    ZUKER, M
    SMITH, ICP
    JOURNAL OF CELL SCIENCE, 1986, 84 : 237 - 251