Double random forest

被引:33
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
Lee, Yung-Seop [3 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
[3] Dongguk Univ, Dept Stat, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Ensemble; Random forest; Bootstrap; Decision tree; CLASSIFICATION TREES; ALGORITHMS; ENSEMBLES;
D O I
10.1007/s10994-020-05889-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.
引用
收藏
页码:1569 / 1586
页数:18
相关论文
共 50 条
  • [1] Double random forest
    Sunwoo Han
    Hyunjoong Kim
    Yung-Seop Lee
    Machine Learning, 2020, 109 : 1569 - 1586
  • [2] Oblique and rotation double random forest
    Ganaie, M. A.
    Tanveer, M.
    Suganthan, P. N.
    Snasel, V.
    NEURAL NETWORKS, 2022, 153 : 496 - 517
  • [3] A comparison of random forest based algorithms: random credal random forest versus oblique random forest
    Carlos J. Mantas
    Javier G. Castellano
    Serafín Moral-García
    Joaquín Abellán
    Soft Computing, 2019, 23 : 10739 - 10754
  • [4] A comparison of random forest based algorithms: random credal random forest versus oblique random forest
    Mantas, Carlos J.
    Castellano, Javier G.
    Moral-Garcia, Serafin
    Abellan, Joaquin
    SOFT COMPUTING, 2019, 23 (21) : 10739 - 10754
  • [5] Specific Random Trees for Random Forest
    Liu, Zhi
    Sun, Zhaocai
    Wang, Hongjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (03) : 739 - 741
  • [6] A Control Optimization Model for a Double-Skin Facade Based on the Random Forest Algorithm
    Sun, Qing
    Du, Yifan
    Yan, Xiuying
    Song, Junwei
    Zhao, Long
    Buildings, 2024, 14 (10)
  • [7] Boosted Random Forest
    Mishina, Yohei
    Murata, Ryuei
    Yamauchi, Yuji
    Yamashita, Takayoshi
    Fujiyoshi, Hironobu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (09) : 1630 - 1636
  • [8] The macroeconomy as a random forest
    Goulet Coulombe, Philippe
    JOURNAL OF APPLIED ECONOMETRICS, 2024, 39 (03) : 401 - 421
  • [9] Boosted Random Forest
    Mishina, Yohei
    Tsuchiya, Masamitsu
    Fujiyoshi, Hironobu
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 594 - 598
  • [10] Reinforced Random Forest
    Paul, Angshuman
    Mukherjee, Dipti Prasad
    TENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2016), 2016,