Bayesian Poisson hierarchical models for crash data analysis: Investigating the impact of model choice on site-specific predictions

被引:14
|
作者
Khazraee, S. Hadi [1 ]
Johnson, Valen [2 ]
Lord, Dominique [3 ]
机构
[1] Uber Technol Inc, San Francisco, CA 94103 USA
[2] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[3] Texas A&M Univ, Zachry Dept Civil Engn, College Stn, TX 77843 USA
来源
关键词
Poisson hierarchical; Bayesian model; Site-specific prediction; Model choice; Poisson-inverse gamma; DEVIANCE INFORMATION CRITERIA; GENERALIZED LINEAR-MODEL; MOTOR-VEHICLE CRASHES; DISPERSION PARAMETER; STATISTICAL-ANALYSIS; REGRESSION-MODEL; GAMMA MODELS; FREQUENCY; SAFETY; IDENTIFICATION;
D O I
10.1016/j.aap.2018.04.016
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
The Poisson-gamma (PG) and Poisson-lognormal (PLN) regression models are among the most popular means for motor vehicle crash data analysis. Both models belong to the Poisson-hierarchical family of models. While numerous studies have compared the overall performance of alternative Bayesian Poisson-hierarchical models, little research has addressed the impact of model choice on the expected crash frequency prediction at individual sites. This paper sought to examine whether there are any trends among candidate models predictions e.g., that an alternative model's prediction for sites with certain conditions tends to be higher (or lower) than that from another model. In addition to the PG and PLN models, this research formulated a new member of the Poisson-hierarchical family of models: the Poisson-inverse gamma (PIGam). Three field datasets (from Texas, Michigan and Indiana) covering a wide range of over-dispersion characteristics were selected for analysis. This study demonstrated that the model choice can be critical when the calibrated models are used for prediction at new sites, especially when the data are highly over-dispersed. For all three datasets, the PIGam model would predict higher expected crash frequencies than would the PLN and PG models, in order, indicating a clear link between the models predictions and the shape of their mixing distributions (i.e., gamma, lognormal, and inverse gamma, respectively). The thicker tail of the PIGam and PLN models (in order) may provide an advantage when the data are highly over-dispersed. The analysis results also illustrated a major deficiency of the Deviance Information Criterion (DIC) in comparing the goodness-of-fit of hierarchical models; models with drastically different set of coefficients (and thus predictions for new sites) may yield similar DIC values, because the DIC only accounts for the parameters in the lowest (observation) level of the hierarchy and ignores the higher levels (regression coefficients).
引用
收藏
页码:181 / 195
页数:15
相关论文
共 12 条