Large-scale attribute selection using wrappers

被引:166
|
作者
Guetlein, Martin [1 ]
Frank, Eibe [2 ]
Hall, Mark [3 ]
Karwath, Andreas [1 ]
机构
[1] Albert Ludwigs Univ Freiburg, Dept Comp Sci, D-7800 Freiburg, Germany
[2] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[3] Pentaho Corp, Orlando, FL USA
关键词
PREDICTION; CANCER;
D O I
10.1109/CIDM.2009.4938668
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed "optimal" subset size. We show that this technique reduces subset size while maintaining comparable accuracy.
引用
收藏
页码:332 / 339
页数:8
相关论文
共 50 条
  • [41] Using diazomethane in large-scale synthesis
    Archibald, T
    MANUFACTURING CHEMIST, 2000, 71 (02): : 20 - 21
  • [42] Robust variable selection and distributed inference using τ-based estimators for large-scale data
    Mozafari-Majd, Emadaldin
    Koivunen, Visa
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 2453 - 2457
  • [43] Metadata Organization and Retrieval with Attribute Tree for Large-Scale Traffic Surveillance Videos
    Tang, Yi
    Zhang, Haitao
    Xu, Bin
    BIG DATA COMPUTING AND COMMUNICATIONS, 2015, 9196 : 434 - 443
  • [44] Fast attribute reduction via inconsistent equivalence classes for large-scale data
    Wang, Guoqiang
    Zhang, Pengfei
    Wang, Dexian
    Chen, Hongmei
    Li, Tianrui
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2023, 163
  • [45] Workload prediction and balance for distributed reachability processing for large-scale attribute graphs
    Ho, Li-Yung
    Wu, Jan-Jan
    Liu, Pangfeng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (06):
  • [46] Attribute-Based Encryption as a Service for Access Control in Large-Scale Organizations
    Bloemer, Johannes
    Guenther, Peter
    Krummel, Volker
    Loeken, Nils
    FOUNDATIONS AND PRACTICE OF SECURITY (FPS 2017), 2018, 10723 : 3 - 17
  • [47] Bridging large-scale neuronal recordings and large-scale network models using dimensionality reduction
    Williamson, Ryan C.
    Doiron, Brent
    Smith, Matthew A.
    Yu, Byron M.
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : 40 - 47
  • [48] An improved sequential backward selection algorithm for large-scale observation selection problems
    Reeves, SJ
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1657 - 1660
  • [49] THE DISTRIBUTION OF QUASARS ON THE LARGE-SCALE AND THE SUPER LARGE-SCALE
    ZHOU, YY
    FANG, DP
    DENG, ZG
    HE, XT
    ASTROPHYSICAL JOURNAL, 1986, 311 (02): : 578 - 588
  • [50] Greedy column subset selection for large-scale data sets
    Farahat, Ahmed K.
    Elgohary, Ahmed
    Ghodsi, Ali
    Kamel, Mohamed S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (01) : 1 - 34