Synthetic data generation by probabilistic PCA

被引:0
|
作者
Park, Min-Jeong [1 ]
机构
[1] Stat Korea, Govt Complex Daejeon,189 Cheongsa Ro, Daejeon 35208, South Korea
关键词
synthetic data; probabilistic principal component analysis; IMPUTATION;
D O I
10.5351/KJAS.2023.36.4.279
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coe fficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.
引用
收藏
页码:279 / 294
页数:16
相关论文
共 50 条
  • [1] Synthetic data generation by probabilistic PCA
    Park, Min-Jeong
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (04) : 279 - 294
  • [2] Synthetic data generation with probabilistic Bayesian Networks
    Gogoshin, Grigoriy
    Branciamore, Sergio
    Rodin, Andrei S.
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2021, 18 (06) : 8603 - 8621
  • [3] PROBABILISTIC PCA FOR HETEROSCEDASTIC DATA
    Hong, David
    Balzano, Laura
    Fessler, Jeffrey A.
    2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 26 - 30
  • [4] Generation of probabilistic synthetic data for serious games: A case study on cyberbullying
    Perez, Jaime
    Castro, Mario
    Awad, Edmond
    Lopez, Gregorio
    KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [5] HePPCAT: Probabilistic PCA for Data With Heteroscedastic Noise
    Hong, David
    Gilman, Kyle
    Balzano, Laura
    Fessler, Jeffrey A.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4819 - 4834
  • [6] A rainfall-runoff probabilistic simulation program .1. Synthetic data generation
    Hromadka, TV
    ENVIRONMENTAL SOFTWARE, 1996, 11 (04): : 235 - 242
  • [7] Probabilistic data generation for deduplication and data linkage
    Christen, P
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 109 - 116
  • [8] A probabilistic forecasting approach towards generation of synthetic battery parameters to resolve limited data challenges
    Naaz, Falak
    Channegowda, Janamejaya
    ENERGY STORAGE, 2022, 4 (04)
  • [9] SYNTHETIC PRECIPITATION DATA GENERATION
    ABTEW, W
    MORAS, RG
    CAMPBELL, KL
    COMPUTERS & INDUSTRIAL ENGINEERING, 1990, 19 (1-4) : 582 - 586
  • [10] Islanding Detection Based on Probabilistic PCA with Missing Values in PMU Data
    Liu, Xueqin
    Laverty, David
    Best, Robert
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,