Tree aggregation for random forest class probability estimation

被引:23
|
作者
Sage, Andrew J. [1 ]
Genschel, Ulrike [2 ]
Nettleton, Dan [2 ]
机构
[1] Lawrence Univ, Dept Math & Comp Sci, Appleton, WI 54912 USA
[2] Iowa State Univ, Dept Stat, Ames, IA USA
关键词
aggregation; class probability estimation; random forest; REGRESSION; ERROR;
D O I
10.1002/sam.11446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In random forest methodology, an overall prediction or estimate is made by aggregating predictions made by individual decision trees. Popular implementations of random forests rely on different methods for aggregating predictions. In this study, we provide an empirical analysis of the performance of aggregation approaches available for classification and regression problems. We show that while the choice of aggregation scheme usually has little impact in regression, it can have a profound effect on probability estimation in classification problems. Our study illustrates the causes of calibration issues that arise from two popular aggregation approaches and highlights the important role that terminal nodesize plays in the aggregation of tree predictions. We show that optimal choices for random forest tuning parameters depend heavily on the manner in which tree predictions are aggregated.
引用
收藏
页码:134 / 150
页数:17
相关论文
共 50 条
  • [1] A Preliminary Study on Class Probability Estimation for Random Forest Using Kernel Density estimators
    Yang, Fan
    Peng, Piao
    Zhou, Qifeng
    2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE), 2016, : 118 - 122
  • [2] DECISION TREE WITH BETTER CLASS PROBABILITY ESTIMATION
    Jiang, Liangxiao
    Li, Chaoqun
    Cai, Zhihua
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (04) : 745 - 763
  • [3] Probabilistic logging lithology characterization with random forest probability estimation
    Ao, Yile
    Zhu, Liping
    Guo, Shuang
    Yang, Zhongguo
    COMPUTERS & GEOSCIENCES, 2020, 144
  • [4] Improving Tree augmented Naive Bayes for class probability estimation
    Jiang, Liangxiao
    Cai, Zhihua
    Wang, Dianhong
    Zhang, Harry
    KNOWLEDGE-BASED SYSTEMS, 2012, 26 : 239 - 245
  • [5] Understanding overfitting in random forest for probability estimation: a visualization and simulation study
    Barrenada, Lasai
    Dhiman, Paula
    Timmerman, Dirk
    Boulesteix, Anne-Laure
    Van Calster, Ben
    DIAGNOSTIC AND PROGNOSTIC RESEARCH, 2024, 8 (01)
  • [6] Conditional probability estimation based classification with class label missing at random
    Sheng, Ying
    Wang, Qihua
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 176
  • [7] Locally private estimation of conditional probability distribution for random forest in multimedia applications
    Wu, Xiaotong
    Bilal, Muhammad
    Xu, Xiaolong
    Song, Houbing
    INFORMATION SCIENCES, 2023, 642
  • [8] Correction: Understanding overfitting in random forest for probability estimation: a visualization and simulation study
    Lasai Barreñada
    Paula Dhiman
    Dirk Timmerman
    Anne-Laure Boulesteix
    Ben Van Calster
    Diagnostic and Prognostic Research, 9 (1)
  • [9] MINIMAX ESTIMATION OF A RANDOM PROBABILITY
    SKIBINSK.M
    SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (01) : 134 - &
  • [10] THE AGGREGATION PROBABILITY IN RANDOM COAGULATION AND BREAKUP PROCESSES
    COHEN, RD
    PARTICLE & PARTICLE SYSTEMS CHARACTERIZATION, 1992, 9 (01) : 28 - 30