Diagnostic checks for discrete data regression models using posterior predictive simulations

被引:83
|
作者
Gelman, A [1 ]
Goegebeur, Y
Tuerlinckx, F
Van Mechelen, I
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
[2] Katholieke Univ Leuven, Louvain, Belgium
关键词
Bayesian statistics; binary regression; generalized linear models; quantile-quantile plots; realized discrepancies; residual plots; sequential design; stochastic learning models;
D O I
10.1111/1467-9876.00190
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Model checking with discrete data regressions can be difficult because the usual methods such as residual plots have complicated reference distributions that depend on the parameters in the model. Posterior predictive checks have been proposed as a Bayesian way to average the results of goodness-of-fit tests in the presence of uncertainty in estimation of the parameters. We try this approach using a variety of discrepancy variables for generalized linear models fitted to a historical data set on behavioural learning. We then discuss the general applicability of our findings in the context of a recent applied example on which we have worked. We find that the following discrepancy variables work well, in the sense of being easy to interpret and sensitive to important model failures: structured displays of the entire data set, general discrepancy variables based on plots of binned or smoothed residuals versus predictors and specific discrepancy variables created on the basis of the particular concerns arising in an application. Plots of binned residuals are especially easy to use because their predictive distributions under the model are sufficiently simple that model checks can often be made implicitly. The following discrepancy variables did not work well: scatterplots of latent residuals defined from an underlying continuous model and quantile-quantile plots of these residuals.
引用
收藏
页码:247 / 268
页数:22
相关论文
共 50 条
  • [41] A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks
    Hoijtink, H
    Molenaar, IW
    PSYCHOMETRIKA, 1997, 62 (02) : 171 - 189
  • [42] Posterior predictive assessment for data subsets in hierarchical models via MCMC - Comment
    Lewis, SM
    Raftery, AE
    STATISTICA SINICA, 1996, 6 (04) : 779 - 786
  • [43] Data-driven Fluid Simulations using Regression Forests
    Ladicky, L'ubor
    Jeong, SoHyeon
    Solenthaler, Barbara
    Pollefeys, Marc
    Gross, Markus
    ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
  • [44] Data Predictive Control using Regression Trees and Ensemble Learning
    Jain, Achin
    Smarra, Francesco
    Mangharam, Rahul
    2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
  • [45] Using predictive models to improve data quality of character data
    Ak, M
    Grossman, D
    Frieder, O
    McCabe, MC
    ISE'2001: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON INFORMATION SYSTEMS AND ENGINEERING, 2001, : 229 - 234
  • [46] Using data mining to build integrated discrete event simulations
    Holland, David A.
    ADVANCES IN DATA MINING, PROCEEDINGS: MEDICAL APPLICATIONS, E-COMMERCE, MARKETING, AND THEORETICAL ASPECTS, 2008, 5077 : 323 - 329
  • [48] BUILDING MULTIVARIABLE PREDICTIVE CONTROL-MODELS BY PROCESS SIMULATION AND DATA REGRESSION
    YIU, Y
    FAN, Y
    COLWELL, LW
    PAPADOPOULOS, MN
    ISA TRANSACTIONS, 1994, 33 (02) : 133 - 140
  • [49] Application of Regression Models on Hydropower Plants Using Numerical and Computational Simulations
    Kyung, Richard
    Kyung, Sco Young
    Ko, Youngseo
    Kim, Claire
    Han, Seung Beom
    Han, Ji Ho
    Yang, Julia
    2016 IEEE INTERNATIONAL CONFERENCE ON RENEWABLE ENERGY RESEARCH AND APPLICATIONS (ICRERA), 2016, : 1097 - 1101
  • [50] Evaluating the predictive performance of habitat models developed using logistic regression
    Pearce, J
    Ferrier, S
    ECOLOGICAL MODELLING, 2000, 133 (03) : 225 - 245