Robustness of random forests for regression

被引:64
|
作者
Roy, Marie-Helene [1 ]
Larocque, Denis [1 ]
机构
[1] HEC Montreal, Dept Management Sci, Montreal, PQ H3T 2A7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
random forest; quantile regression forest; robustness; median; ranks; least-absolute deviations;
D O I
10.1080/10485252.2012.715161
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we empirically investigate the robustness of random forests for regression problems. We also investigate the performance of six variations of the original random forest method, all aimed at improving robustness. These variations are based on three main ideas: (1) robustify the aggregation method, (2) robustify the splitting criterion and (3) taking a robust transformation of the response. More precisely, with the first idea, we use the median (or weighted median), instead of the mean, to combine the predictions from the individual trees. With the second idea, we use least-absolute deviations from the median, instead of least-squares, as splitting criterion. With the third idea, we build the trees using the ranks of the response instead of the original values. The competing methods are compared via a simulation study with artificial data using two different types of contaminations and also with 13 real data sets. Our results show that all three ideas improve the robustness of the original random forest algorithm. However, a robust aggregation of the individual trees is generally more profitable than a robust splitting criterion.
引用
收藏
页码:993 / 1006
页数:14
相关论文
共 50 条
  • [21] Pricing Bermudan Options Using Regression Trees/Random Forests
    Ech-Chafiq, Zineb El Filali
    Labordere, Pierre Henry
    Lelong, Jerome
    [J]. SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2023, 14 (04): : 1113 - 1139
  • [22] Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression
    Ćevid, Domagoj
    Michel, Loris
    Näf, Jeffrey
    Bühlmann, Peter
    Meinshausen, Nicolai
    [J]. Journal of Machine Learning Research, 2022, 23
  • [23] Use of random forests regression for predicting IRI of asphalt pavements
    Gong, Hongren
    Sun, Yiren
    Shu, Xiang
    Huang, Baoshan
    [J]. CONSTRUCTION AND BUILDING MATERIALS, 2018, 189 : 890 - 897
  • [24] Bioprocess data mining using regularized regression and random forests
    Hassan, Syeda Sakira
    Farhan, Muhammad
    Mangayil, Rahul
    Huttunen, Heikki
    Aho, Tommi
    [J]. BMC SYSTEMS BIOLOGY, 2013, 7
  • [25] Prediction of Torpedo Initial Velocity Based on Random Forests Regression
    Zhang, Ling
    Wang, Pukai
    Jiang, Tianyuan
    Fan, Gehua
    Dan, Caihong
    [J]. 2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 337 - 339
  • [26] Distributional Random Forests: Heterogeneity Adjustment and Multivariate Distributional Regression
    Cevid, Domagoj
    Michel, Loris
    Naf, Jeffrey
    Buhlmann, Peter
    Meinshausen, Nicolai
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [27] Robust Regression Random Forests by Small and Noisy Training Data
    Min, Lev, V
    Kovalev, Maxim S.
    Coolen, Frank P. A.
    [J]. PROCEEDINGS OF 2019 XXII INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM), 2019, : 134 - 137
  • [28] Penalized semiparametric Cox regression model on XGBoost and random survival forests
    Wang, Yating
    Su, Jinxia
    Zhao, Xuejing
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (07) : 3095 - 3103
  • [29] Causal Random Forests Model Using Instrumental Variable Quantile Regression
    Chen, Jau-er
    Hsiang, Chen-Wei
    [J]. ECONOMETRICS, 2019, 7 (04)
  • [30] Comparing spatial regression to random forests for large environmental data sets
    Fox, Eric W.
    Ver Hoef, Jay M.
    Olsen, Anthony R.
    [J]. PLOS ONE, 2020, 15 (03):