Minimax Rates for High-Dimensional Random Tessellation Forests

被引:0
|
作者
O'Reilly, Eliza [1 ]
Tran, Ngoc Mai [2 ]
机构
[1] Johns Hopkins Univ, Appl Math & Stat Dept, Baltimore, MD 21218 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
关键词
random forest regression; Mondrian process; STIT tessellation; Poisson hy- perplane tessellation; minimax risk bound; GEOMETRY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Random forests are a popular class of algorithms used for regression and classification. The algorithm introduced by Breiman in 2001 and many of its variants are ensembles of randomized decision trees built from axis-aligned partitions of the feature space. One such variant, called Mondrian forests, was proposed to handle the online setting and is the first class of random forests for which minimax optimal rates were obtained in arbitrary dimension. However, the restriction to axis-aligned splits fails to capture dependencies between features, and random forests that use oblique splits have shown improved empirical performance for many tasks. This work shows that a large class of random forests with general split directions also achieve minimax optimal rates in arbitrary dimension. This class includes STIT forests, a generalization of Mondrian forests to arbitrary split directions, and random forests derived from Poisson hyperplane tessellations. These are the first results showing that random forest variants with oblique splits can obtain minimax optimality in arbitrary dimension. Our proof technique relies on the novel application of the theory of stationary random tessellations in stochastic geometry to statistical learning theory.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Random forests for high-dimensional longitudinal data
    Capitaine, Louis
    Genuer, Robin
    Thiebaut, Rodolphe
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 166 - 184
  • [2] ASYMPTOTIC PROPERTIES OF HIGH-DIMENSIONAL RANDOM FORESTS
    Chi, Chien-Ming
    Vossler, Patrick
    Fan, Yingying
    Lv, Jinchi
    [J]. ANNALS OF STATISTICS, 2022, 50 (06): : 3415 - 3438
  • [3] MINIMAX RATES IN SPARSE, HIGH-DIMENSIONAL CHANGE POINT DETECTION
    Liu, Haoyang
    Gao, Chao
    Samworth, Richard J.
    [J]. ANNALS OF STATISTICS, 2021, 49 (02): : 1081 - 1112
  • [4] Interaction Detection with Random Forests in High-Dimensional Data
    Winham, Stacey
    Wang, Xin
    de Andrade, Mariza
    Freimuth, Robert
    Colby, Colin
    Huebner, Marianne
    Biernacka, Joanna
    [J]. GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 142 - 142
  • [5] Random Tessellation Forests
    Ge, Shufei
    Wang, Shijia
    Teh, Yee Whye
    Wang, Liangliang
    Elliott, Lloyd T.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] CONFIDENCE INTERVALS FOR HIGH-DIMENSIONAL LINEAR REGRESSION: MINIMAX RATES AND ADAPTIVITY
    Cai, T. Tony
    Guo, Zijian
    [J]. ANNALS OF STATISTICS, 2017, 45 (02): : 615 - 646
  • [7] On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data
    Schwarz, Daniel F.
    Koenig, Inke R.
    Ziegler, Andreas
    [J]. BIOINFORMATICS, 2010, 26 (14) : 1752 - 1758
  • [8] HYPOTHESIS TESTING FOR DENSITIES AND HIGH-DIMENSIONAL MULTINOMIALS: SHARP LOCAL MINIMAX RATES
    Balakrishnan, Sivaraman
    Wasserman, Larry
    [J]. ANNALS OF STATISTICS, 2019, 47 (04): : 1893 - 1927
  • [9] SNP interaction detection with Random Forests in high-dimensional genetic data
    Winham, Stacey J.
    Colby, Colin L.
    Freimuth, Robert R.
    Wang, Xin
    de Andrade, Mariza
    Huebner, Marianne
    Biernacka, Joanna M.
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [10] SNP interaction detection with Random Forests in high-dimensional genetic data
    Stacey J Winham
    Colin L Colby
    Robert R Freimuth
    Xin Wang
    Mariza de Andrade
    Marianne Huebner
    Joanna M Biernacka
    [J]. BMC Bioinformatics, 13