Flexible High-Dimensional Unsupervised Learning with Missing Data

被引:11
|
作者
Wei, Yuhong [1 ]
Tang, Yang [1 ]
McNicholas, Paul D. [1 ]
机构
[1] McMaster Univ, Dept Math & Stat, Hamilton, ON L8S 4L8, Canada
关键词
Analytical models; Computational modeling; Data models; Unsupervised learning; Covariance matrices; Clustering algorithms; Mixture models; Clustering; factor analysis; generalized hyperbolic; missing data; mixture of factor analyzers; mixture model; model-based clustering; unsupervised classification; STOCHASTIC OZONE DAYS; T-FACTOR ANALYZERS; MIXTURE-MODELS; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; BAYESIAN-ANALYSIS; ALGORITHM;
D O I
10.1109/TPAMI.2018.2885760
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The mixture of factor analyzers (MFA) model is a famous mixture model-based approach for unsupervised learning with high-dimensional data. It can be useful, inter alia, in situations where the data dimensionality far exceeds the number of observations. In recent years, the MFA model has been extended to non-Gaussian mixtures to account for clusters with heavier tail weight and/or asymmetry. The generalized hyperbolic factor analyzers (MGHFA) model is one such extension, which leads to a flexible modelling paradigm that accounts for both heavier tail weight and cluster asymmetry. In many practical applications, the occurrence of missing values often complicates data analyses. A generalization of the MGHFA is presented to accommodate missing values. Under a missing-at-random mechanism, we develop a computationally efficient alternating expectation conditional maximization algorithm for parameter estimation of the MGHFA model with different patterns of missing values. The imputation of missing values under an incomplete-data structure of MGHFA is also investigated. The performance of our proposed methodology is illustrated through the analysis of simulated and real data.
引用
收藏
页码:610 / 621
页数:12
相关论文
共 50 条
  • [1] Missing Data Imputation with High-Dimensional Data
    Brini, Alberto
    van den Heuvel, Edwin R.
    [J]. AMERICAN STATISTICIAN, 2024, 78 (02): : 240 - 252
  • [2] Identifying redundant features using unsupervised learning for high-dimensional data
    Danasingh, Asir Antony Gnana Singh
    Subramanian, Appavu alias Balamurugan
    Epiphany, Jebamalar Leavline
    [J]. SN APPLIED SCIENCES, 2020, 2 (08):
  • [3] Identifying redundant features using unsupervised learning for high-dimensional data
    Asir Antony Gnana Singh Danasingh
    Appavu alias Balamurugan Subramanian
    Jebamalar Leavline Epiphany
    [J]. SN Applied Sciences, 2020, 2
  • [4] Flexible co-data learning for high-dimensional prediction
    van Nee, Mirrelijn M.
    Wessels, Lodewyk F. A.
    van de Wiel, Mark A.
    [J]. STATISTICS IN MEDICINE, 2021, 40 (26) : 5910 - 5925
  • [5] Deep Learning-Bat High-Dimensional Missing Data Estimator
    Leke, Collins
    Ndjiongue, A. R.
    Twala, Bhekisipho
    Marwala, Tshilidzi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 483 - 488
  • [6] Missing data in interactive high-dimensional data visualization
    Swayne, DF
    Buja, A
    [J]. COMPUTATIONAL STATISTICS, 1998, 13 (01) : 15 - 26
  • [7] Learning high-dimensional data
    Verleysen, M
    [J]. LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
  • [8] Handling high-dimensional data with missing values by modern machine learning techniques
    Chen, Sixia
    Xu, Chao
    [J]. JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 786 - 804
  • [9] Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
    Yun, Taedong
    Cosentino, Justin
    Behsaz, Babak
    McCaw, Zachary R.
    Hill, Davin
    Luben, Robert
    Lai, Dongbing
    Bates, John
    Yang, Howard
    Schwantes-An, Tae-Hwi
    Zhou, Yuchen
    Khawaja, Anthony P.
    Carroll, Andrew
    Hobbs, Brian D.
    Cho, Michael H.
    Mclean, Cory Y.
    Hormozdiari, Farhad
    [J]. NATURE GENETICS, 2024, : 1604 - 1613
  • [10] A method for learning a sparse classifier in the presence of missing data for high-dimensional biological datasets
    Severson, Kristen A.
    Monian, Brinda
    Love, J. Christopher
    Braatz, Richard D.
    [J]. BIOINFORMATICS, 2017, 33 (18) : 2897 - 2905