Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

被引:34
|
作者
Lo, Kenneth [2 ]
Gottardo, Raphael [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98104 USA
[2] Univ Washington, Dept Microbiol, Seattle, WA 98195 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Box-Cox transformation; Clustering; EM algorithm; Outliers; Robustness; Skewness; MAXIMUM-LIKELIHOOD-ESTIMATION; HIGH-DIMENSIONAL DATA; DISCRIMINANT-ANALYSIS; LARGE DATASETS; ML-ESTIMATION; EM; ALGORITHM; ECM; BIOCONDUCTOR; SEGMENTATION;
D O I
10.1007/s11222-010-9204-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
引用
收藏
页码:33 / 52
页数:20
相关论文
共 50 条
  • [1] Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution
    Kenneth Lo
    Raphael Gottardo
    [J]. Statistics and Computing, 2012, 22 : 33 - 52
  • [2] Power transformation via multivariate Box-Cox
    Lindsey, Charles
    Sheather, Simon
    [J]. STATA JOURNAL, 2010, 10 (01): : 69 - 81
  • [3] Some Statistical Aspects of the Truncated Multivariate Skew-t Distribution
    Alejandro Moran-Vasquez, Raul
    Zarrazola, Edwin
    Nagar, Daya K.
    [J]. MATHEMATICS, 2022, 10 (15)
  • [4] Extremal properties of the skew-t distribution
    Peng, Zuoxiang
    Li, Chunqiao
    Nadarajah, Saralees
    [J]. STATISTICS & PROBABILITY LETTERS, 2016, 112 : 10 - 19
  • [5] Flexible mixture modelling using the multivariate skew-t-normal distribution
    Lin, Tsung-I
    Ho, Hsiu J.
    Lee, Chia-Rong
    [J]. STATISTICS AND COMPUTING, 2014, 24 (04) : 531 - 546
  • [6] Flexible mixture modelling using the multivariate skew-t-normal distribution
    Tsung-I Lin
    Hsiu J. Ho
    Chia-Rong Lee
    [J]. Statistics and Computing, 2014, 24 : 531 - 546
  • [7] A matrix variate skew-t distribution
    Gallaugher, Michael P. B.
    McNicholas, Paul D.
    [J]. STAT, 2017, 6 (01): : 160 - 170
  • [8] Goodness of Fit Test for The Skew-T Distribution
    Maghami, M.
    Bahrami, M.
    [J]. JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE-JMCS, 2015, 14 (04): : 274 - 283
  • [9] Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis
    Rigby, Robert A.
    Stasinopoulos, D. Mikis
    [J]. STATISTICAL MODELLING, 2006, 6 (03) : 209 - 229
  • [10] The Linear Skew-t Distribution and Its Properties
    Adcock, C. J.
    [J]. STATS, 2023, 6 (01): : 381 - 410