A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data

被引:41
|
作者
Shirazi, Mohammadali [1 ]
Lord, Dominique [1 ]
Dhavala, Soma Sekhar [2 ]
Geedipally, Srinivas Reddy [3 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil Engn, College Stn, TX 77843 USA
[2] Perceptron Learning Solut Pvt Ltd, Bengaluru, India
[3] Texas A&M Univ, Texas A&M Transportat Inst, College Stn, TX 77843 USA
来源
关键词
Negative binomial; Dirichlet process; Generalized linear model; Crash data; DIRICHLET; INTERSECTIONS; INFERENCE; VARIANCE; MIXTURES; BAYES;
D O I
10.1016/j.aap.2016.02.020
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Crash data can often be characterized by over-dispersion, heavy (long) tail and many observations with the value zero. Over the last few years, a small number of researchers have started developing and applying novel and innovative multi-parameter models to analyze such data. These multi-parameter models have been proposed for overcoming the limitations of the traditional negative binomial (NB) model, which cannot handle this kind of data efficiently. The research documented in this paper continues the work related to multi-parameter models. The objective of this paper is to document the development and application of a flexible NB generalized linear model with randomly distributed mixed effects characterized by the Dirichlet process (NB-DP) to model crash data. The objective of the study was accomplished using two datasets. The new model was compared to the NB and the recently introduced model based on the mixture of the NB and Lindley (NB-L) distributions. Overall, the research study shows that the NB-DP model offers a better performance than the NB model once data are over-dispersed and have a heavy tail. The NB-DP performed better than the NB-L when the dataset has a heavy tail, but a smaller percentage of zeros. However, both models performed similarly when the dataset contained a large amount of zeros. In addition to a greater flexibility, the NB-DP provides a clustering by-product that allows the safety analyst to better understand the characteristics of the data, such as the identification of outliers and sources of dispersion. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:10 / 18
页数:9
相关论文
共 26 条
  • [1] Modeling over-dispersed crash data with a long tail: Examining the accuracy of the dispersion parameter in Negative Binomial models
    Zou, Yajie
    Wu, Lingtao
    Lord, Dominique
    ANALYTIC METHODS IN ACCIDENT RESEARCH, 2015, 5-6 : 1 - 16
  • [2] Developing a Random Parameters Negative Binomial-Lindley Model to analyze highly over-dispersed crash count data
    Shaon, Mohammad Razaur Rahman
    Qin, Xiao
    Shirazi, Mohammadali
    Lord, Dominique
    Geedipally, Srinivas Reddy
    ANALYTIC METHODS IN ACCIDENT RESEARCH, 2018, 18 : 33 - 44
  • [3] Multilevel zero-inflated negative binomial regression modeling for over-dispersed count data with extra zeros
    Moghimbeigi, Abbas
    Eshraghian, Mohammed Reza
    Mohammad, Kazem
    McArdle, Brian
    JOURNAL OF APPLIED STATISTICS, 2008, 35 (10) : 1193 - 1202
  • [4] Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros
    Yau, KKW
    Wang, K
    Lee, AH
    BIOMETRICAL JOURNAL, 2003, 45 (04) : 437 - 452
  • [5] Modeling under or over-dispersed binomial count data by using extended Altham distribution families
    Asma, Senay
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2021, 50 (01): : 255 - 274
  • [6] The negative binomial-Lindley generalized linear model: Characteristics and application using crash data
    Geedipally, Srinivas Reddy
    Lord, Dominique
    Dhavala, Soma Sekhar
    ACCIDENT ANALYSIS AND PREVENTION, 2012, 45 : 258 - 265
  • [7] A New Compound Distribution and Its Applications in Over-dispersed Count Data
    Ahmad P.B.
    Wani M.K.
    Annals of Data Science, 2024, 11 (05) : 1799 - 1820
  • [8] A spline-based semiparametric sieve likelihood method for over-dispersed panel count data
    Hua, Lei
    Zhang, Ying
    Tu, Wanzhu
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2014, 42 (02): : 217 - 245
  • [9] Tree-structured logistic model for over-dispersed binomial data with application to modeling developmental effects
    Ahn, H
    Chen, JJ
    BIOMETRICS, 1997, 53 (02) : 435 - 455
  • [10] Bivariate negative binomial generalized linear models for environmental count data
    Iwasaki, Masakazu
    Tsubaki, Hiroe
    JOURNAL OF APPLIED STATISTICS, 2006, 33 (09) : 909 - 923