Large-scale attribute selection using wrappers

被引:166
|
作者
Guetlein, Martin [1 ]
Frank, Eibe [2 ]
Hall, Mark [3 ]
Karwath, Andreas [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Dept Comp Sci, D-7800 Freiburg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Pentaho Corp, Orlando, FL USA
关键词
PREDICTION; CANCER;
D O I
10.1109/CIDM.2009.4938668
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed "optimal" subset size. We show that this technique reduces subset size while maintaining comparable accuracy.
引用
收藏
页码:332 / 339
页数:8
相关论文
共 50 条
  • [1] LARGE-SCALE RANKING AND SELECTION USING CLOUD COMPUTING
    Luo, Jun
    Hong, L. Jeff
    PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 4046 - 4056
  • [2] Attribute Description Service for Large-Scale Networks
    Kline, Donald
    Quan, John
    HUMAN CENTERED DESIGN (HCD), 2011, 6776 : 519 - 528
  • [3] Large-scale feature selection using evolved neural networks
    Stathakis, Demetris
    Topouzelis, Kostas
    Karathanassi, Vassilia
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XII, 2006, 6365
  • [4] Large-Scale Analyses of Positive Selection Using Codon Models
    Studer, Romain A.
    Robinson-Rechavi, Marc
    EVOLUTIONARY BIOLOGY: CONCEPT, MODELING, AND APPLICATION, 2009, : 217 - 235
  • [5] Selection and Execution of large-scale projects
    Ahrens, G. -A.
    Beckmann, K. J.
    Boltze, M.
    Eisenkopf, A.
    Fricke, H.
    Knieps, G.
    Knorr, A.
    Mitusch, K.
    Oeter, S.
    Radermacher, F. -J
    Sieg, G.
    Siegmann, J.
    Schlag, B.
    Stoelzle, W.
    Vallee, D.
    Winner, H.
    BAUINGENIEUR, 2015, 90 : 129 - 139
  • [6] Large-scale resource selection in grids
    Roumani, AM
    Skillicorn, DB
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2004: OTM 2004 WORKSHOPS, PROCEEDINGS, 2004, 3292 : 154 - 164
  • [7] Large-Scale Loan Portfolio Selection
    Sirignano, Justin A.
    Tsoukalas, Gerry
    Giesecke, Kay
    OPERATIONS RESEARCH, 2016, 64 (06) : 1239 - 1255
  • [8] Using Data Accessibility for Resource Selection in Large-Scale Distributed Systems
    Kim, Jinoh
    Chandra, Abhishek
    Weissman, Jon B.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (06) : 788 - 801
  • [9] PLAR: Parallel Large-scale Attribute Reduction on Cloud Systems
    Zhang, Junbo
    Li, Tianrui
    Pan, Yi
    2013 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2013, : 184 - 191
  • [10] Automatic Wrappers for Large Scale Web Extraction
    Dalvi, Nilesh
    Kumar, Ravi
    Soliman, Mohamed
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (04): : 219 - 230