Double random forest

被引:33
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
Lee, Yung-Seop [3 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
[3] Dongguk Univ, Dept Stat, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Ensemble; Random forest; Bootstrap; Decision tree; CLASSIFICATION TREES; ALGORITHMS; ENSEMBLES;
D O I
10.1007/s10994-020-05889-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest (RF) is one of the most popular parallel ensemble methods, using decision trees as classifiers. One of the hyper-parameters to choose from for RF fitting is the nodesize, which determines the individual tree size. In this paper, we begin with the observation that for many data sets (34 out of 58), the best RF prediction accuracy is achieved when the trees are grown fully by minimizing the nodesize parameter. This observation leads to the idea that prediction accuracy could be further improved if we find a way to generate even bigger trees than the ones with a minimum nodesize. In other words, the largest tree created with the minimum nodesize parameter may not be sufficiently large for the best performance of RF. To produce bigger trees than those by RF, we propose a new classification ensemble method called double random forest (DRF). The new method uses bootstrap on each node during the tree creation process, instead of just bootstrapping once on the root node as in RF. This method, in turn, provides an ensemble of more diverse trees, allowing for more accurate predictions. Finally, for data where RF does not produce trees of sufficient size, we have successfully demonstrated that DRF provides more accurate predictions than RF.
引用
收藏
页码:1569 / 1586
页数:18
相关论文
共 50 条
  • [11] Multinomial random forest
    Bai, Jiawang
    Li, Yiming
    Li, Jiawei
    Yang, Xue
    Jiang, Yong
    Xia, Shu-Tao
    PATTERN RECOGNITION, 2022, 122
  • [12] A fuzzy random forest
    Bonissone, Piero
    Cadenas, Jose M.
    Carmen Garrido, M.
    Andres Diaz-Valladares, R.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2010, 51 (07) : 729 - 747
  • [13] Using Random Forest To Model the Domain Applicability of Another Random Forest Model
    Sheridan, Robert P.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (11) : 2837 - 2850
  • [14] IoT Intrusion Detection Using Modified Random Forest Based on Double Feature Selection Methods
    Hussein, Adil Yousef
    Falcarin, Paolo
    Sadiq, Ahmed T.
    EMERGING TECHNOLOGY TRENDS IN INTERNET OF THINGS AND COMPUTING, TIOTC 2021, 2022, : 61 - 78
  • [15] Transparent rule generator random forest (TRG-RF): an interpretable random forest
    Boruah, Arpita Nath
    Biswas, Saroj Kumar
    Bandyopadhyay, Sivaji
    EVOLVING SYSTEMS, 2023, 14 (01) : 69 - 83
  • [16] Transparent rule generator random forest (TRG-RF): an interpretable random forest
    Arpita Nath Boruah
    Saroj Kumar Biswas
    Sivaji Bandyopadhyay
    Evolving Systems, 2023, 14 : 69 - 83
  • [17] M-ary Random Forest - A new multidimensional partitioning approach to Random Forest
    Vikas Jain
    Ashish Phophalia
    Multimedia Tools and Applications, 2021, 80 : 35217 - 35238
  • [18] The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance
    Agjee, Na'eem Hoosen
    Mutanga, Onisimo
    Peerbhay, Kabir
    Ismail, Riyad
    JOURNAL OF SPECTROSCOPY, 2018, 2018
  • [19] On Reducing the Bias of Random Forest
    Adnan, Md Nasim
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 187 - 195
  • [20] Random forest classifier with R
    Ghattas, Badih
    JOURNAL OF THE SFDS, 2019, 160 (02): : 97 - 98