Genome-wide prediction using Bayesian additive regression trees

被引:34
|
作者
Waldmann, Patrik [1 ]
机构
[1] Swedish Univ Agr Sci SLU, Genet Swedish Univ Agr, Box 7023, S-75007 Uppsala, Sweden
关键词
VARIABLE SELECTION; TRAITS; CLASSIFICATION; PLANT; BART;
D O I
10.1186/s12711-016-0219-8
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
Background: The goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods. Results: Tests on the QTLMAS2010 simulated data, which are mainly based on additive genetic effects, show that cross-validated optimization of BART provides a smaller prediction error than the RF, BLASSO, GBLUP and RKHS methods, and is almost as accurate as the LASSO method. If dominance and epistasis effects are added to the QTLMAS2010 data, the accuracy of BART relative to the other methods was increased. We also showed that BART can produce importance measures on the SNPs through variable inclusion proportions. In evaluations using real data on pigs, the prediction error was smaller with BART than with the other methods. Conclusions: BART was shown to be an accurate method for GWP, in which the regression trees guarantee a very sparse representation of additive and complex non-additive genetic effects. Moreover, the Markov chain Monte Carlo algorithm with Bayesian back-fitting provides a computationally efficient procedure that is suitable for high-dimensional genomic data.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Genome-wide prediction using Bayesian additive regression trees
    Patrik Waldmann
    [J]. Genetics Selection Evolution, 48
  • [2] Bayesian Additive Regression Trees using Bayesian model averaging
    Belinda Hernández
    Adrian E. Raftery
    Stephen R Pennington
    Andrew C. Parnell
    [J]. Statistics and Computing, 2018, 28 : 869 - 890
  • [3] Bayesian Additive Regression Trees using Bayesian model averaging
    Hernandez, Belinda
    Raftery, Adrian E.
    Pennington, Stephen R.
    Parnell, Andrew C.
    [J]. STATISTICS AND COMPUTING, 2018, 28 (04) : 869 - 890
  • [4] Prediction with missing data via Bayesian Additive Regression Trees
    Kapelner, Adam
    Bleich, Justin
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 224 - 239
  • [5] Model Mixing Using Bayesian Additive Regression Trees
    Yannotty, John C.
    Santner, Thomas J.
    Furnstahl, Richard J.
    Pratola, Matthew T.
    [J]. TECHNOMETRICS, 2024, 66 (02) : 196 - 207
  • [6] Variable Selection Using Bayesian Additive Regression Trees
    Luo, Chuji
    Daniels, Michael J.
    [J]. STATISTICAL SCIENCE, 2024, 39 (02) : 286 - 304
  • [7] Bayesian additive regression trees with model trees
    Prado, Estevao B.
    Moral, Rafael A.
    Parnell, Andrew C.
    [J]. STATISTICS AND COMPUTING, 2021, 31 (03)
  • [8] Bayesian additive regression trees with model trees
    Estevão B. Prado
    Rafael A. Moral
    Andrew C. Parnell
    [J]. Statistics and Computing, 2021, 31
  • [9] Genome-Wide Regression and Prediction with the BGLR Statistical Package
    Perez, Paulino
    de los Campos, Gustavo
    [J]. GENETICS, 2014, 198 (02) : 483 - U63
  • [10] Genome-wide prediction of discrete traits using bayesian regressions and machine learning
    Oscar González-Recio
    Selma Forni
    [J]. Genetics Selection Evolution, 43