Parallel Bayesian Additive Regression Trees

被引:39
|
作者
Pratola, Matthew T. [1 ]
Chipman, Hugh A. [2 ]
Gattiker, James R. [3 ]
Higdon, David M. [3 ]
McCulloch, Robert [4 ]
Rust, William N. [3 ]
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[2] Acadia Univ, Dept Math & Stat, Wolfville, NS B4P 2R6, Canada
[3] Los Alamos Natl Lab, Stat Sci Grp, Los Alamos, NM 87545 USA
[4] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA
基金
美国国家科学基金会;
关键词
Big Data; Markov chain Monte Carlo; Nonlinear; Scalable; Statistical computing; COMPUTER-MODEL CALIBRATION; DESIGN;
D O I
10.1080/10618600.2013.841584
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
引用
收藏
页码:830 / 852
页数:23
相关论文
共 50 条
  • [41] Dynamic Treatment Regimes Using Bayesian Additive Regression Trees for Censored Outcomes
    Li, Xiao
    Logan, Brent R.
    Hossain, S. M. Ferdous
    Moodie, Erica E. M.
    [J]. LIFETIME DATA ANALYSIS, 2024, 30 (01) : 181 - 212
  • [42] Dynamic Treatment Regimes Using Bayesian Additive Regression Trees for Censored Outcomes
    Xiao Li
    Brent R. Logan
    S. M. Ferdous Hossain
    Erica E. M. Moodie
    [J]. Lifetime Data Analysis, 2024, 30 : 181 - 212
  • [43] Bayesian Additive Regression Trees (BART) with covariate adjusted borrowing in subgroup analyses
    Pan, Jane
    Bunn, Veronica
    Hupf, Bradley
    Lin, Jianchang
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2022, 32 (04) : 613 - 626
  • [44] Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees
    Green, Donald P.
    Kern, Holger L.
    [J]. PUBLIC OPINION QUARTERLY, 2012, 76 (03) : 491 - 511
  • [45] Additive groves of regression trees
    Sorokina, Daria
    Caruana, Rich
    Riedewald, Mirek
    [J]. MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 323 - +
  • [46] Bayesian additive regression trees-based spam detection for enhanced email privacy
    Abu-Nimeh, Saeed
    Nappa, Dario
    Wang, Xinlei
    Nair, Suku
    [J]. ARES 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON AVAILABILITY, SECURITY AND RELIABILITY, 2008, : 1044 - 1051
  • [47] Incorporating external data into the analysis of clinical trials via Bayesian additive regression trees
    Zhou, Tianjian
    Ji, Yuan
    [J]. STATISTICS IN MEDICINE, 2021, 40 (28) : 6421 - 6442
  • [48] Decision making and uncertainty quantification for individualized treatments using Bayesian Additive Regression Trees
    Logan, Brent R.
    Sparapani, Rodney
    McCulloch, Robert E.
    Laud, Purushottam W.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (04) : 1079 - 1093
  • [49] Pedestrian crossing volume estimation at signalized intersections using Bayesian additive regression trees
    Li, Xiaofeng
    Xu, Peipei
    Wu, Yao-Jan
    [J]. JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 26 (05) : 557 - 571
  • [50] Distribution-Level Peak Load Prediction Based On Bayesian Additive Regression Trees
    Chen, Tairen
    Lehr, Jane
    Lavrova, Olga
    Martinez-Ramon, Manel
    [J]. 2016 IEEE POWER AND ENERGY SOCIETY GENERAL MEETING (PESGM), 2016,