RESIDUAL DIAGNOSTICS FOR MIXTURE-MODELS

被引:56
|
作者
LINDSAY, BG [1 ]
ROEDER, K [1 ]
机构
[1] YALE UNIV, DEPT STAT, NEW HAVEN, CT 06520 USA
关键词
EM ALGORITHM; EXPONENTIAL FAMILY MIXTURES; NONPARAMETRIC MIXTURES; OVERDISPERSION;
D O I
10.2307/2290216
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A sample is commonly modeled by a mixture distribution if the observations follow a common distribution, but the parameter of interest differs between observations. For example, we observe the lengths but not the ages of a sample of fish. It may be reasonable to assume that length is normally distributed about an unknown mean that depends on the age of the fish. Provided there is more than one age class in the sample, then the data are distributed as a mixture of normals. In this article we assume that the data are a random sample from a mixture of exponential family distributions and that for each observation the parameter of interest is sampled independently from an unknown mixing distribution Q. The adequacy of a fitted mixture model can be assessed by examining residuals based on the ratio of the observed to expected fit. Residuals based on the homogeneity model (in which Q is a one-point distribution) display a convexity property when the data follow a mixture model; this becomes the basis for diagnostic plots to detect the presence of mixing. Similar results also are obtained from smoothed residuals; thus the diagnostic also can be applied to sparse or continuous data. The nonparametric maximum likelihood estimate Q of the distribution Q is known to be discrete. Smoothed residuals obtained from the fitted mixed model provide information about the number of support points in Q. This facilitates the use of the EM algorithm to find Q. The residuals evaluated at Q determine whether or not the maximum likelihood estimate is unique and hence interpretable. Simulated and actual data sets are analyzed to illustrate the power and the utility of these procedures.
引用
收藏
页码:785 / 794
页数:10
相关论文
共 50 条