Model-based clustering via mixtures of unrestricted skew normal factor analyzers with complete and incomplete data

被引:1
|
作者
Wang, Wan-Lun [3 ,4 ]
Lin, Tsung-, I [1 ,2 ]
机构
[1] Natl Chung Hsing Univ, Inst Stat, Taichung 402, Taiwan
[2] China Med Univ, Dept Publ Hlth, Taichung 404, Taiwan
[3] Natl Cheng Kung Univ, Dept Stat, Tainan 701, Taiwan
[4] Natl Cheng Kung Univ, Inst Data Sci, Tainan 701, Taiwan
来源
STATISTICAL METHODS AND APPLICATIONS | 2023年 / 32卷 / 03期
关键词
Canonical fundamental skew normal distribution; ECM algorithm; Mixture of factor analyzers; Missing at random; Multivariate truncated normal distribution; Unrestricted multivariate skew normal distribution; MAXIMUM-LIKELIHOOD-ESTIMATION; MULTIVARIATE;
D O I
10.1007/s10260-022-00674-x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Mixtures of factor analyzers (MFA) based on the restricted skew normal distribution (rMSN) have emerged as a flexible tool to handle asymmetrical high-dimensional data with heterogeneity. However, the rMSN distribution is oft-criticized a lack of sufficient ability to accommodate potential skewness arisen from more than one feature space. This paper presents an alternative extension of MFA by assuming the unrestricted skew normal (uMSN) distribution for the component factors. In particular, the proposed mixtures of unrestricted skew normal factor analyzers (MuSNFA) can simultaneously capture multiple directions of skewness and deal with the occurrence of missing values or nonresponses. Under the missing at random (MAR) mechanism, we develop a computationally feasible expectation conditional maximization (ECM) algorithm for computing the maximum likelihood estimates of model parameters. Practical aspects related to model-based clustering, prediction of factor scores and imputation of missing values are also discussed. The utility of the proposed methodology is illustrated with the analysis of simulated and real datasets.
引用
收藏
页码:787 / 817
页数:31
相关论文
共 50 条