VSURF: An R Package for Variable Selection Using Random Forests

被引:0
|
作者
Genuer, Robin [1 ,2 ]
Poggi, Jean-Michel [3 ]
Tuleau-Malot, Christine [4 ]
机构
[1] Univ Bordeaux, ISPED, Ctr INSERM Epidemiol Biostat U897, F-33000 Bordeaux, France
[2] INRIA Bordeaux Sud Ouest, SISTM Team, F-33400 Talence, France
[3] Univ Orsay, Math Lab, F-91405 Orsay, France
[4] Univ Nice Sophia Antipolis, LJAD, CNRS, UMR 7351, F-06100 Nice, France
来源
R JOURNAL | 2015年 / 7卷 / 02期
关键词
GENE-EXPRESSION DATA; CLASSIFICATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.
引用
收藏
页码:19 / 33
页数:15
相关论文
共 50 条
  • [1] Variable selection using random forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2225 - 2236
  • [2] Variable selection using random forests
    Sandri, Marco
    Zuccolotto, Paola
    [J]. DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 263 - +
  • [3] A new variable selection approach using Random Forests
    Hapfelmeier, A.
    Ulm, K.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 60 : 50 - 69
  • [4] Variable selection by Random Forests using data with missing values
    Hapfelmeier, A.
    Ulm, K.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 80 : 129 - 139
  • [5] RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests
    Alaku, Cansu
    Larocque, Denis
    Labbe, Aurelie
    [J]. R JOURNAL, 2022, 14 (01): : 300 - 319
  • [6] GALGO: an R package for multivariate variable selection using genetic algorithms
    Trevino, V
    Falciani, F
    [J]. BIOINFORMATICS, 2006, 22 (09) : 1154 - 1156
  • [7] Variable selection using support vector regression and random forests: A comparative study
    Ben Ishak, Anis
    [J]. INTELLIGENT DATA ANALYSIS, 2016, 20 (01) : 83 - 104
  • [8] FWDselect: An R Package for Variable Selection in Regression Models
    Sestelo, Marta
    Villanueva, Nora M.
    Meira-Machado, Luis
    Roca-Pardinas, Javier
    [J]. R JOURNAL, 2016, 8 (01): : 132 - 148
  • [9] wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests
    Zhao, He
    Williams, Graham J.
    Huang, Joshua Zhexue
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2017, 77 (03): : 1 - 30
  • [10] Dependence-biased clustering for variable selection with random forests
    Gazzola, Gianluca
    Jeong, Myong Kee
    [J]. PATTERN RECOGNITION, 2019, 96