akaike information criterion;
cluster analysis;
EM algorithm;
mixture models;
model selection;
variable selection;
D O I:
10.1198/016214506000000861
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
We examine the problem of jointly selecting the number of components and variables in finite mixture regression models. We find that the Akaike information criterion is unsatisfactory for this purpose because it overestimates the number of components, which in turn results in incorrect variables being retained in the model. Therefore, we derive a new information criterion, the mixture regression criterion (MRC), that yields marked improvement in model selection due to what we call the "clustering penalty function." Moreover, we prove the asymptotic efficiency of the MRC. We show that it performs well in Monte Carlo studies for the same or different covariates across components with equal or unequal sample sizes. We also present an empirical example on sales territory management to illustrate the application and efficacy of the MRC. Finally, we generalize the MRC to mixture quasi-likelihood and mixture autoregressive models, thus extending its applicability to non-Gaussian models, discrete responses, and dependent data.