Trees, forests, chickens, and eggs: when and why to prune trees in a random forest

被引:6
|
作者
Zhou, Siyu [1 ]
Mentch, Lucas [1 ]
机构
[1] Univ Pittsburgh, Dept Stat, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
bagging; degrees of freedom; interpolation; model selection; regularization; PERFORMANCE; PREDICTION; MODELS; IMPACT;
D O I
10.1002/sam.11594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to their long-standing reputation as excellent off-the-shelf predictors, random forests (RFs) continue to remain a go-to model of choice for applied statisticians and data scientists. Despite their widespread use, however, until recently, little was known about their inner workings and about which aspects of the procedure were driving their success. Very recently, two competing hypotheses have emerged-one based on interpolation and the other based on regularization. This work argues in favor of the latter by utilizing the regularization framework to reexamine the decades-old question of whether individual trees in an ensemble ought to be pruned. Despite the fact that default constructions of RFs use near full depth trees in most popular software packages, here we provide strong evidence that tree depth should be seen as a natural form of regularization across the entire procedure. In particular, our work suggests that RFs with shallow trees are advantageous when the signal-to-noise ratio in the data is low. In building up this argument, we also critique the newly popular notion of "double descent" in RFs by drawing parallels to U-statistics and arguing that the noticeable jumps in random forest accuracy are the result of simple averaging rather than interpolation.
引用
收藏
页码:45 / 64
页数:20
相关论文
共 50 条
  • [1] Decision trees and random forests
    Becker, Thijs
    Rousseau, Axel-Jan
    Geubbelmans, Melvin
    Burzykowski, Tomasz
    Valkenborg, Dirk
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (06) : 894 - 897
  • [2] Specific Random Trees for Random Forest
    Liu, Zhi
    Sun, Zhaocai
    Wang, Hongjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (03) : 739 - 741
  • [3] NUMBER OF TREES IN A RANDOM FOREST
    PALMER, EM
    SCHWENK, AJ
    JOURNAL OF COMBINATORIAL THEORY SERIES B, 1979, 27 (02) : 109 - 121
  • [4] Random Trees Are the Cornerstones of Natural Forests
    Zhang, Gongqiao
    Hui, Gangying
    FORESTS, 2021, 12 (08):
  • [5] Small trees in supercritical random forests
    Lei, Tao
    CANADIAN MATHEMATICAL BULLETIN-BULLETIN CANADIEN DE MATHEMATIQUES, 2021, 64 (03): : 605 - 623
  • [6] On the Selection of Decision Trees in Random Forests
    Bernard, Simon
    Heutte, Laurent
    Adam, Sebastien
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 790 - 795
  • [7] Can’t see the forest for the trees: Analyzing groves to explain random forests
    Szepannek G.
    Holt B.-H.
    Behaviormetrika, 2024, 51 (1) : 411 - 423
  • [8] Random forest with acceptance–rejection trees
    Peter Calhoun
    Melodie J. Hallett
    Xiaogang Su
    Guy Cafri
    Richard A. Levine
    Juanjuan Fan
    Computational Statistics, 2020, 35 : 983 - 999
  • [9] Random forests with stochastic induction of decision trees
    Tsouros, Dimosthenis C.
    Smyrlis, Panagiotis N.
    Tsipouras, Markos G.
    Giannakeas, Nikolaos
    Tzallas, Alexandros T.
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 527 - 531
  • [10] Nodes of large degree in random trees and forests
    Gittenberger, B
    RANDOM STRUCTURES & ALGORITHMS, 2006, 28 (03) : 374 - 385