Double random forest

被引:33
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
Lee, Yung-Seop [3 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
[3] Dongguk Univ, Dept Stat, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Ensemble; Random forest; Bootstrap; Decision tree; CLASSIFICATION TREES; ALGORITHMS; ENSEMBLES;
D O I
10.1007/s10994-020-05889-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.
引用
收藏
页码:1569 / 1586
页数:18
相关论文
共 50 条
  • [21] Exponentially Weighted Random Forest
    Jain, Vikas
    Sharma, Jaya
    Singhal, Kriti
    Phophalia, Ashish
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 170 - 178
  • [22] Visualisation of Random Forest classification
    Macas, Catarina
    Campos, Joao R.
    Lourenco, Nuno
    Machado, Penousal
    INFORMATION VISUALIZATION, 2024, 23 (04) : 312 - 327
  • [23] PCA Embedded Random Forest
    Gardner, Charles
    Lo, Dan Chia-Tien
    SOUTHEASTCON 2021, 2021, : 783 - 788
  • [24] Random Forest for the Real Forests
    Agrawal, Sharan
    Rana, Shivam
    Ahmad, Tanvir
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 301 - 309
  • [25] Improved Random Forest for Classification
    Paul, Angshuman
    Mukherjee, Dipti Prasad
    Das, Prasun
    Gangopadhyay, Abhinandan
    Chintha, Appa Rao
    Kundu, Saurabh
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 4012 - 4024
  • [26] Dissimilarity Random Forest Clustering
    Bicego, Manuele
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 936 - 941
  • [27] Face classification by a random forest
    Kouzani, A. Z.
    Nahavandi, S.
    Khoshmanesh, K.
    TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 652 - 655
  • [28] Playing in Unison in the Random Forest
    Wieczorkowska, Alicja A.
    Kursa, Miron B.
    Kubera, Elzbieta
    Rudnicki, Radoslaw
    Rudnicki, Witold R.
    SECURITY AND INTELLIGENT INFORMATION SYSTEMS, 2012, 7053 : 226 - +
  • [29] Thresholding a Random Forest Classifier
    Baumann, Florian
    Li, Fangda
    Ehlers, Arne
    Rosenhahn, Bodo
    ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT II, 2014, 8888 : 95 - 106
  • [30] A random forest guided tour
    Biau, Gerard
    Scornet, Erwan
    TEST, 2016, 25 (02) : 197 - 227