Empirical characterization of random forest variable importance measures

被引:746
|
作者
Archer, Kelfie J. [1 ]
Kirnes, Ryan V. [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Biostat, Richmond, VA 23298 USA
关键词
random forest; classification tree; variable importance; bootstrap aggregating;
D O I
10.1016/j.csda.2007.08.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Microarray studies yield data sets consisting of a large number of candidate predictors (genes) on a small number of observations (samples). When interest lies in predicting phenotypic class using gene expression data, often the goals are both to produce an accurate classifier and to uncover the predictive structure of the problem. Most machine learning methods, such as k-nearest neighbors, support vector machines, and neural networks, are useful for classification. However, these methods provide no insight regarding the covariates that best contribute to the predictive structure. Other methods, such as linear discriminant analysis, require the predictor space be substantially reduced prior to deriving the classifier. A recently developed method, random forests (RF), does not require reduction of the predictor space prior to classification. Additionally, RF yield variable importance measures for each candidate predictor. This study examined the effectiveness of RF variable importance measures in identifying the true predictor among a large number of candidate predictors. An extensive simulation study was conducted using 20 levels of correlation among the predictor variables and 7 levels of association between the true predictor and the dichotomous response. We conclude that the RF methodology is attractive for use in classification problems when the goals of the study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. Such goals are common among microarray studies, and therefore application of the RF methodology for the purpose of obtaining variable importance measures is demonstrated on a microarray data set.. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:2249 / 2260
页数:12
相关论文
共 50 条
  • [41] Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction
    Meredith L. Wallace
    Lucas Mentch
    Bradley J. Wheeler
    Amanda L. Tapia
    Marc Richards
    Siyu Zhou
    Lixia Yi
    Susan Redline
    Daniel J. Buysse
    BMC Medical Research Methodology, 23
  • [42] Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction
    Wallace, Meredith L.
    Mentch, Lucas
    Wheeler, Bradley J.
    Tapia, Amanda L.
    Richards, Marc
    Zhou, Siyu
    Yi, Lixia
    Redline, Susan
    Buysse, Daniel J.
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [43] Variable importance-weighted Random Forests
    Yiyi Liu
    Hongyu Zhao
    Quantitative Biology, 2017, 5 (04) : 338 - 351
  • [44] An importance sampling for a function of a multivariate random variable
    Park, Jae-Yeol
    Kang, Hee-Geon
    Kim, Sunggon
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2024, 31 (01) : 65 - 85
  • [45] Forward variable selection for random forest models
    Velthoen, Jasper
    Cai, Juan-Juan
    Jongbloed, Geurt
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (13) : 2836 - 2856
  • [46] Modeling of free swelling index based on variable importance measurements of parent coal properties by random forest method
    Chelgani, S. Chehreh
    Matin, S. S.
    Makaremi, S.
    MEASUREMENT, 2016, 94 : 416 - 422
  • [47] Characterization of empirical discrepancy evaluation measures
    Fernández-García, NL
    Medina-Carnicer, R
    Carmona-Poyato, A
    Madrid-Cuevas, FJ
    Prieto-Villegas, M
    PATTERN RECOGNITION LETTERS, 2004, 25 (01) : 35 - 47
  • [48] CRAGGING Measures of Variable Importance for Data with Hierarchical Structure
    Vezzoli, Marika
    Zuccolotto, Paola
    NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 393 - 400
  • [49] Large deviations for empirical measures of generalized random graphs
    Liu, Qun
    Dong, Zhishan
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (08) : 2676 - 2687
  • [50] Large Deviations of Empirical Measures of Zeros of Random Polynomials
    Zeitouni, Ofer
    Zelditch, Steve
    INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2010, 2010 (20) : 3935 - 3992