Parallel Bayesian Additive Regression Trees

被引:39
|
作者
Pratola, Matthew T. [1 ]
Chipman, Hugh A. [2 ]
Gattiker, James R. [3 ]
Higdon, David M. [3 ]
McCulloch, Robert [4 ]
Rust, William N. [3 ]
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
[2] Acadia Univ, Dept Math & Stat, Wolfville, NS B4P 2R6, Canada
[3] Los Alamos Natl Lab, Stat Sci Grp, Los Alamos, NM 87545 USA
[4] Univ Chicago, Booth Sch Business, Chicago, IL 60637 USA
基金
美国国家科学基金会;
关键词
Big Data; Markov chain Monte Carlo; Nonlinear; Scalable; Statistical computing; COMPUTER-MODEL CALIBRATION; DESIGN;
D O I
10.1080/10618600.2013.841584
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.
引用
收藏
页码:830 / 852
页数:23
相关论文
共 50 条
  • [1] Bayesian additive regression trees with model trees
    Prado, Estevao B.
    Moral, Rafael A.
    Parnell, Andrew C.
    [J]. STATISTICS AND COMPUTING, 2021, 31 (03)
  • [2] Bayesian additive regression trees with model trees
    Estevão B. Prado
    Rafael A. Moral
    Andrew C. Parnell
    [J]. Statistics and Computing, 2021, 31
  • [3] BART: BAYESIAN ADDITIVE REGRESSION TREES
    Chipman, Hugh A.
    George, Edward I.
    McCulloch, Robert E.
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (01): : 266 - 298
  • [4] Particle Gibbs for Bayesian Additive Regression Trees
    Lakshminarayanan, Balaji
    Roy, Daniel M.
    Teh, Yee Whye
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 553 - 561
  • [5] XBART: Accelerated Bayesian Additive Regression Trees
    He, Jingyu
    Yalov, Saar
    Hahn, P. Richard
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [6] Partially fixed bayesian additive regression trees
    Ran, Hao
    Bai, Yang
    [J]. STATISTICAL THEORY AND RELATED FIELDS, 2024, 8 (03) : 232 - 242
  • [7] Multinomial probit Bayesian additive regression trees
    Kindo, Bereket P.
    Wang, Hao
    Pena, Edsel A.
    [J]. STAT, 2016, 5 (01): : 119 - 131
  • [8] Bayesian Additive Regression Trees using Bayesian model averaging
    Belinda Hernández
    Adrian E. Raftery
    Stephen R Pennington
    Andrew C. Parnell
    [J]. Statistics and Computing, 2018, 28 : 869 - 890
  • [9] Bayesian Additive Regression Trees using Bayesian model averaging
    Hernandez, Belinda
    Raftery, Adrian E.
    Pennington, Stephen R.
    Parnell, Andrew C.
    [J]. STATISTICS AND COMPUTING, 2018, 28 (04) : 869 - 890
  • [10] Bayesian additive regression trees for multivariate skewed responses
    Um, Seungha
    Linero, Antonio R.
    Sinha, Debajyoti
    Bandyopadhyay, Dipankar
    [J]. STATISTICS IN MEDICINE, 2023, 42 (03) : 246 - 263