Flexible High-Dimensional Unsupervised Learning with Missing Data

被引:11
|
作者
Wei, Yuhong [1 ]
Tang, Yang [1 ]
McNicholas, Paul D. [1 ]
机构
[1] McMaster Univ, Dept Math & Stat, Hamilton, ON L8S 4L8, Canada
关键词
Analytical models; Computational modeling; Data models; Unsupervised learning; Covariance matrices; Clustering algorithms; Mixture models; Clustering; factor analysis; generalized hyperbolic; missing data; mixture of factor analyzers; mixture model; model-based clustering; unsupervised classification; STOCHASTIC OZONE DAYS; T-FACTOR ANALYZERS; MIXTURE-MODELS; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; BAYESIAN-ANALYSIS; ALGORITHM;
D O I
10.1109/TPAMI.2018.2885760
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mixture of factor analyzers (MFA) model is a famous mixture model-based approach for unsupervised learning with high-dimensional data. It can be useful, inter alia, in situations where the data dimensionality far exceeds the number of observations. In recent years, the MFA model has been extended to non-Gaussian mixtures to account for clusters with heavier tail weight and/or asymmetry. The generalized hyperbolic factor analyzers (MGHFA) model is one such extension, which leads to a flexible modelling paradigm that accounts for both heavier tail weight and cluster asymmetry. In many practical applications, the occurrence of missing values often complicates data analyses. A generalization of the MGHFA is presented to accommodate missing values. Under a missing-at-random mechanism, we develop a computationally efficient alternating expectation conditional maximization algorithm for parameter estimation of the MGHFA model with different patterns of missing values. The imputation of missing values under an incomplete-data structure of MGHFA is also investigated. The performance of our proposed methodology is illustrated through the analysis of simulated and real data.
引用
收藏
页码:610 / 621
页数:12
相关论文
共 50 条
  • [31] Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data
    Deng, Yi
    Chang, Changgee
    Ido, Moges Seyoum
    Long, Qi
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [32] Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
    Popovic, Daniel
    Fouche, Edouard
    Boehm, Klemens
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 3 - 19
  • [33] Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data
    Mwangi, Benson
    Soares, Jair C.
    Hasan, Khader M.
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2014, 236 : 19 - 25
  • [34] Time dimension feature extraction and classification of high-dimensional large data streams based on unsupervised learning
    Jiang, Xiaobo
    Jiang, Yunchuan
    Liu, Leping
    Xia, Meng
    Jiang, Yunlu
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2024, 24 (02) : 835 - 848
  • [35] Efficient unsupervised drift detector for fast and high-dimensional data streams
    Vinicius M. A. Souza
    Antonio R. S. Parmezan
    Farhan A. Chowdhury
    Abdullah Mueen
    [J]. Knowledge and Information Systems, 2021, 63 : 1497 - 1527
  • [36] Efficient unsupervised drift detector for fast and high-dimensional data streams
    Souza, Vinicius M. A.
    Parmezan, Antonio R. S.
    Chowdhury, Farhan A.
    Mueen, Abdullah
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) : 1497 - 1527
  • [37] Unsupervised locally embedded clustering for automatic high-dimensional data labeling
    Fu, Yun
    Huang, Thomas S.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PTS 1-3, PROCEEDINGS, 2007, : 1057 - +
  • [38] Pattern Alternating Maximization Algorithm for Missing Data in High-Dimensional Problems
    Stadler, Nicolas
    Stekhoven, Daniel J.
    Buehlmann, Peter
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 1903 - 1928
  • [39] High-dimensional conditionally Gaussian state space models with missing data
    Chan, Joshua C. C.
    Poon, Aubrey
    Zhu, Dan
    [J]. JOURNAL OF ECONOMETRICS, 2023, 236 (01)
  • [40] High-dimensional missing data imputation via undirected graphical model
    Lee, Yoonah
    Park, Seongoh
    [J]. STATISTICS AND COMPUTING, 2024, 34 (05)