Tree aggregation for random forest class probability estimation

被引:23
|
作者
Sage, Andrew J. [1 ]
Genschel, Ulrike [2 ]
Nettleton, Dan [2 ]
机构
[1] Lawrence Univ, Dept Math & Comp Sci, Appleton, WI 54912 USA
[2] Iowa State Univ, Dept Stat, Ames, IA USA
关键词
aggregation; class probability estimation; random forest; REGRESSION; ERROR;
D O I
10.1002/sam.11446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In random forest methodology, an overall prediction or estimate is made by aggregating predictions made by individual decision trees. Popular implementations of random forests rely on different methods for aggregating predictions. In this study, we provide an empirical analysis of the performance of aggregation approaches available for classification and regression problems. We show that while the choice of aggregation scheme usually has little impact in regression, it can have a profound effect on probability estimation in classification problems. Our study illustrates the causes of calibration issues that arise from two popular aggregation approaches and highlights the important role that terminal nodesize plays in the aggregation of tree predictions. We show that optimal choices for random forest tuning parameters depend heavily on the manner in which tree predictions are aggregated.
引用
收藏
页码:134 / 150
页数:17
相关论文
共 50 条
  • [42] Some probability inequalities for a class of random variables and their applications
    Shen, Aiting
    Wu, Ranchao
    JOURNAL OF INEQUALITIES AND APPLICATIONS, 2013,
  • [43] On a Class of Random Probability Measures with General Predictive Structure
    Favaro, Stefano
    Prunster, Igor
    Walker, Stephen G.
    SCANDINAVIAN JOURNAL OF STATISTICS, 2011, 38 (02) : 359 - 376
  • [44] Some probability inequalities for a class of random variables and their applications
    Aiting Shen
    Ranchao Wu
    Journal of Inequalities and Applications, 2013
  • [45] Tree height and tropical forest biomass estimation
    Hunter, M. O.
    Keller, M.
    Victoria, D.
    Morton, D. C.
    BIOGEOSCIENCES, 2013, 10 (12) : 8385 - 8399
  • [46] Estimation of fertility variation in forest tree populations
    Kang, KS
    Bila, AD
    Harju, AM
    Lindgren, D
    FORESTRY, 2003, 76 (03): : 329 - 344
  • [47] Generalizing Tree Probability Estimation via Bayesian Networks
    Zhang, Cheng
    Matsen, Frederick A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [48] Learning naive Bayes Tree for conditional probability estimation
    Liang, Han
    Yan, Yuhong
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 455 - 466
  • [49] Limit distributions of the maximum size of a tree in a random forest
    Pavlov, Yu.L.
    Discrete Mathematics and Applications, 5 (04):
  • [50] On conditions for emergence of a giant tree in a random unlabelled forest
    Khvorostyanskaya, E. V.
    DISCRETE MATHEMATICS AND APPLICATIONS, 2007, 17 (05): : 439 - 454