Tree aggregation for random forest class probability estimation

被引：23

作者：

Sage, Andrew J. ^{[1
]}

Genschel, Ulrike ^{[2
]}

Nettleton, Dan ^{[2
]}

机构：

[1] Lawrence Univ, Dept Math & Comp Sci, Appleton, WI 54912 USA

[2] Iowa State Univ, Dept Stat, Ames, IA USA

来源：

STATISTICAL ANALYSIS AND DATA MINING | 2020年 / 13卷 / 02期

关键词：

aggregation; class probability estimation; random forest; REGRESSION; ERROR;

D O I：

10.1002/sam.11446

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In random forest methodology, an overall prediction or estimate is made by aggregating predictions made by individual decision trees. Popular implementations of random forests rely on different methods for aggregating predictions. In this study, we provide an empirical analysis of the performance of aggregation approaches available for classification and regression problems. We show that while the choice of aggregation scheme usually has little impact in regression, it can have a profound effect on probability estimation in classification problems. Our study illustrates the causes of calibration issues that arise from two popular aggregation approaches and highlights the important role that terminal nodesize plays in the aggregation of tree predictions. We show that optimal choices for random forest tuning parameters depend heavily on the manner in which tree predictions are aggregated.

引用

页码：134 / 150

页数：17

共 50 条

[1] A Preliminary Study on Class Probability Estimation for Random Forest Using Kernel Density estimators
Yang, Fan
Peng, Piao
Zhou, Qifeng
2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE), 2016, : 118 - 122
[2] DECISION TREE WITH BETTER CLASS PROBABILITY ESTIMATION
Jiang, Liangxiao
Li, Chaoqun
Cai, Zhihua
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (04) : 745 - 763
[3] Probabilistic logging lithology characterization with random forest probability estimation
Ao, Yile
Zhu, Liping
Guo, Shuang
Yang, Zhongguo
COMPUTERS & GEOSCIENCES, 2020, 144
[4] Improving Tree augmented Naive Bayes for class probability estimation
Jiang, Liangxiao
Cai, Zhihua
Wang, Dianhong
Zhang, Harry
KNOWLEDGE-BASED SYSTEMS, 2012, 26 : 239 - 245
[5] Understanding overfitting in random forest for probability estimation: a visualization and simulation study
Barrenada, Lasai
Dhiman, Paula
Timmerman, Dirk
Boulesteix, Anne-Laure
Van Calster, Ben
DIAGNOSTIC AND PROGNOSTIC RESEARCH, 2024, 8 (01)
[6] Conditional probability estimation based classification with class label missing at random
Sheng, Ying
Wang, Qihua
JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 176
[7] Locally private estimation of conditional probability distribution for random forest in multimedia applications
Wu, Xiaotong
Bilal, Muhammad
Xu, Xiaolong
Song, Houbing
INFORMATION SCIENCES, 2023, 642
[8] Correction: Understanding overfitting in random forest for probability estimation: a visualization and simulation study
Lasai Barreñada
Paula Dhiman
Dirk Timmerman
Anne-Laure Boulesteix
Ben Van Calster
Diagnostic and Prognostic Research, 9 (1)
[9] MINIMAX ESTIMATION OF A RANDOM PROBABILITY
SKIBINSK.M
SIAM JOURNAL ON APPLIED MATHEMATICS, 1968, 16 (01) : 134 - &
[10] THE AGGREGATION PROBABILITY IN RANDOM COAGULATION AND BREAKUP PROCESSES
COHEN, RD
PARTICLE & PARTICLE SYSTEMS CHARACTERIZATION, 1992, 9 (01) : 28 - 30

← 1 2 3 4 5 →