Missing value imputation in a data matrix using the regularised singular value decomposition

被引:1
|
作者
Arciniegas-Alarcon, Sergio [1 ]
Garcia-Pena, Marisol [2 ]
Krzanowski, Wojtek J. [3 ]
Rengifo, Camilo [1 ]
机构
[1] Univ Sabana, Fac Ingn, Chia, Colombia
[2] Pontificia Univ Javeriana, Dept Matemat, Bogota, Colombia
[3] Univ Exeter, Coll Engn Math & Phys Sci, Exeter, England
关键词
Eigenvalues; Eigenvectors; Iterative computational scheme; Cross-validation; Genotype-by-environment interaction; Overfitting; GGE BIPLOT;
D O I
10.1016/j.mex.2023.102289
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the qual-ity of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from mul-tienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when com-pared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. & BULL; The imputation method is modified through the inclusion of a stable and efficient compu-tational algorithm that replaces the classical SVD least squares criterion by a penalised cri-terion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Bayesian Singular Value Decomposition Procedure for Missing Data Imputation
    Zhai, Ruoshui
    Gutman, Roee
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 470 - 482
  • [2] Faster Imputation Using Singular Value Decomposition for Sparse Data
    Phuc Nguyen
    Tran, Linh G. H.
    Le, Bao H.
    Nguyen, Thuong H. T.
    Thu Nguyen
    Nguyen, Hien D.
    Nguyen, Binh T.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT I, 2023, 13995 : 135 - 146
  • [3] Missing-value imputation using the robust singular-value decomposition: Proposals and numerical evaluation
    Garcia-Pena, Marisol
    Arciniegas-Alarcon, Sergio
    Krzanowski, Wojtek J.
    Duarte, Diego
    CROP SCIENCE, 2021, 61 (05) : 3288 - 3300
  • [4] Imputation of Mixed Data With Multilevel Singular Value Decomposition
    Husson, Francois
    Josse, Julie
    Narasimhan, Balasubramanian
    Robin, Genevieve
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2019, 28 (03) : 552 - 566
  • [5] Missing value imputation in time series using Singular Spectrum Analysis
    Mahmoudvand, Rahim
    Rodrigues, Paulo Canas
    INTERNATIONAL JOURNAL OF ENERGY AND STATISTICS, 2016, 4 (01)
  • [6] Incremental singular value decomposition of uncertain data with missing values
    Brand, M
    COMPUTER VISON - ECCV 2002, PT 1, 2002, 2350 : 707 - 720
  • [7] Block Tensor Train Decomposition for Missing Value Imputation
    Lee, Namgil
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1338 - 1343
  • [8] Missing value imputation on missing completely at random data using multilayer perceptrons
    Silva-Ramirez, Esther-Lydia
    Pino-Mejias, Rafael
    Lopez-Coello, Manuel
    Cubiles-de-la-Vega, Maria-Dolores
    NEURAL NETWORKS, 2011, 24 (01) : 121 - 129
  • [9] Distribution-free multiple imputation in an interaction matrix through singular value decomposition
    Bergamo, Genevile Carife
    dos Santos Dias, Carlos Tadeu
    Krzanowski, Wojtek Janusz
    SCIENTIA AGRICOLA, 2008, 65 (04): : 422 - 427
  • [10] Missing value imputation strategies for metabolomics data
    Grace Armitage, Emily
    Godzien, Joanna
    Alonso-Herranz, Vanesa
    Lopez-Gonzalvez, Angeles
    Barbas, Coral
    ELECTROPHORESIS, 2015, 36 (24) : 3050 - 3060