Double random forest

被引：33

作者：

Han, Sunwoo ^{[1
]}

Kim, Hyunjoong ^{[2
]}

Lee, Yung-Seop ^{[3
]}

机构：

[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA

[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea

[3] Dongguk Univ, Dept Stat, Seoul 04620, South Korea

来源：

MACHINE LEARNING | 2020年 / 109卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

Classification; Ensemble; Random forest; Bootstrap; Decision tree; CLASSIFICATION TREES; ALGORITHMS; ENSEMBLES;

D O I：

10.1007/s10994-020-05889-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.

引用

页码：1569 / 1586

页数：18

共 50 条

[31] Random Pairwise Shapelets Forest
Shi, Mohan
Wang, Zhihai
Yuan, Jidong
Liu, Haiyang
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT I, 2018, 10937 : 68 - 80
[32] Random Forest Prediction Intervals
Zhang, Haozhe
Zimmerman, Joshua
Nettleton, Dan
Nordman, Daniel J.
AMERICAN STATISTICIAN, 2020, 74 (04): : 392 - 406
[33] Musical Instruments in Random Forest
Kursa, Miron
Rudnicki, Witold
Wieczorkowska, Alicja
Kubera, Elzbieta
Kubik-Komar, Agnieszka
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 281 - +
[34] Random Forest for Image Annotation
Fu, Hao
Zhang, Qian
Qiu, Guoping
COMPUTER VISION - ECCV 2012, PT VI, 2012, 7577 : 86 - 99
[35] Global Refinement of Random Forest
Ren, Shaoqing
Cao, Xudong
Wei, Yichen
Sun, Jian
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 723 - 730
[36] A random forest guided tour
Gérard Biau
Erwan Scornet
TEST, 2016, 25 : 197 - 227
[37] A weighted random survival forest
Utkin, Lev V.
Konstantinov, Andrei V.
Chukanov, Viacheslav S.
Kots, Mikhail V.
Ryabinin, Mikhail A.
Meldo, Anna A.
KNOWLEDGE-BASED SYSTEMS, 2019, 177 : 136 - 144
[38] Early Random Shapelet Forest
Karlsson, Isak
Papapetrou, Panagiotis
Bostrom, Henrik
DISCOVERY SCIENCE, (DS 2016), 2016, 9956 : 261 - 276
[39] Random Forest Spatial Interpolation
Sekulic, Aleksandar
Kilibarda, Milan
Heuvelink, Gerard B. M.
Nikolic, Mladen
Bajat, Branislav
REMOTE SENSING, 2020, 12 (10)
[40] Differential Private Random Forest
Patil, Abhijit
Singh, Sanjay
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2623 - 2630

← 1 2 3 4 5 →