Dimension reduction of high-dimensional dataset with missing values

被引:2
|
作者
Zhang, Ran [1 ]
Ye, Bin [2 ]
Liu, Peng [2 ]
机构
[1] Xuzhou Med Univ, Sch Med Informat & Engn, Xuzhou, Jiangsu, Peoples R China
[2] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
关键词
Dimension reduction; high-dimensional data; missing value; PRINCIPAL COMPONENT ANALYSIS; COVARIANCE-MATRIX ESTIMATION; SPECTRUM ESTIMATION; IMPUTATION;
D O I
10.1177/1748302619867440
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nowadays, datasets containing a very large number of variables or features are routinely generated in many fields. Dimension reduction techniques are usually performed prior to statistically analyzing these datasets in order to avoid the effects of the curse of dimensionality. Principal component analysis is one of the most important techniques for dimension reduction and data visualization. However, datasets with missing values arising in almost every field will produce biased estimates and are difficult to handle, especially in the high dimension, low sample size settings. By exploiting a Lasso estimator of the population covariance matrix, we propose to regularize the principal component analysis to reduce the dimensionality of dataset with missing data. The Lasso estimator of covariance matrix is computationally tractable by solving a convex optimization problem. To illustrate the effectiveness of our method on dimension reduction, the principal component directions are evaluated by the metrics of Frobenius norm and cosine distance. The performances are compared with other incomplete data handling methods such as mean substitution and multiple imputation. Simulation results also show that our method is superior to other incomplete data handling methods in the context of discriminant analysis of real world high-dimensional datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] High-dimensional Data Dimension Reduction Based on KECA
    Hu, Yongde
    Pan, Jingchang
    Tan, Xin
    [J]. SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1101 - 1104
  • [2] Dimension Reduction for High-Dimensional Vector Autoregressive Models
    Cubadda, Gianluca
    Hecq, Alain
    [J]. OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 2022, 84 (05) : 1123 - 1152
  • [3] DECIDING THE DIMENSION OF EFFECTIVE DIMENSION REDUCTION SPACE FOR FUNCTIONAL AND HIGH-DIMENSIONAL DATA
    Li, Yehua
    Hsing, Tailen
    [J]. ANNALS OF STATISTICS, 2010, 38 (05): : 3028 - 3062
  • [4] Visualisation and dimension reduction of high-dimensional data for damage detection
    Worden, K
    Manson, G
    [J]. IMAC - PROCEEDINGS OF THE 17TH INTERNATIONAL MODAL ANALYSIS CONFERENCE, VOLS I AND II, 1999, 3727 : 1576 - 1585
  • [5] High-dimensional sufficient dimension reduction through principal projections
    Pircalabelu, Eugen
    Artemiou, Andreas
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 1804 - 1830
  • [6] Optimal dimension reduction for high-dimensional and functional time series
    Hallin M.
    Hörmann S.
    Lippi M.
    [J]. Statistical Inference for Stochastic Processes, 2018, 21 (2) : 385 - 398
  • [7] Improving Penalized Logistic Regression Model with Missing Values in High-Dimensional Data
    Alharthi, Aiedh Mrisi
    Lee, Muhammad Hisyam
    Algamal, Zakariya Yahya
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2022, 18 (02) : 40 - 54
  • [8] Revisiting the Problem of Missing Values in High-Dimensional Data and Feature Selection Effect
    Elia, Marina G.
    Duan, Wenting
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT I, AIAI 2024, 2024, 711 : 201 - 213
  • [9] Handling high-dimensional data with missing values by modern machine learning techniques
    Chen, Sixia
    Xu, Chao
    [J]. JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 786 - 804
  • [10] High-dimensional Learned Index Based on Space Division and Dimension Reduction
    Zhang, Shao-Min
    Cai, Pan
    Li, Cui-Ping
    Chen, Hong
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2413 - 2426