Synthetic data generation by probabilistic PCA

被引:0
|
作者
Park, Min-Jeong [1 ]
机构
[1] Stat Korea, Govt Complex Daejeon,189 Cheongsa Ro, Daejeon 35208, South Korea
关键词
synthetic data; probabilistic principal component analysis; IMPUTATION;
D O I
10.5351/KJAS.2023.36.4.279
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coe fficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.
引用
收藏
页码:279 / 294
页数:16
相关论文
共 50 条
  • [31] A synthetic fraud data generation methodology
    Lundin, E
    Kvarnström, H
    Jonsson, E
    INFORMATION AND COMMUNICATIONS SECURITY, PROCEEDINGS, 2002, 2513 : 265 - 277
  • [32] Generation and evaluation of medical synthetic data
    Goncalves, Andre R.
    Ray, Priyadip
    Soper, Braden
    Myneni, Madhumita
    Stevens, Jennifer L.
    Coyle, Linda M.
    Sales, Ana Paula
    CANCER RESEARCH, 2019, 79 (13)
  • [33] Generation of synthetic data for tropical cyclones
    Abraham, R
    Mohanty, UC
    Dash, SK
    12TH INTERNATIONAL CONFERENCE ON INTERACTIVE INFORMATION AND PROCESSING SYSTEMS (IIPS) FOR METEOROLOGY, OCEANOGRAPHY, AND HYDROLOGY: JOINT SESSION WITH FIFTH SYMPOSIUM ON EDUCATION, 1996, : 479 - 479
  • [34] Declarative generation of synthetic XML data
    Barbosa, Denilson
    Mendelzon, Alberto O.
    SOFTWARE-PRACTICE & EXPERIENCE, 2006, 36 (10): : 1051 - 1079
  • [35] THE GENERATION OF SYNTHETIC CLINICAL TRIAL DATA
    Mosquera, L.
    VALUE IN HEALTH, 2019, 22 : S519 - S519
  • [36] Status of Synthetic Data Generation for Structured Health Data
    El Emam, Khaled
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [37] Status of Synthetic Data Generation for Structured Health Data
    El Emam, Khaled
    JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [38] Probabilistic Flood Mapping Using Synthetic Aperture Radar Data
    Giustarini, Laura
    Hostache, Renaud
    Kavetski, Dmitri
    Chini, Marco
    Corato, Giovanni
    Schlaffer, Stefan
    Matgen, Patrick
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (12): : 6958 - 6969
  • [39] Automatic model selection for Probabilistic PCA
    Lopez-Rubio, Ezequiel
    Ortiz-de-Lazcano-Lobato, Juan Miguel
    Lopez-Rodriguez, Domingo
    Vargas-Gonzalez, Maria del Carmen
    COMPUTATIONAL AND AMBIENT INTELLIGENCE, 2007, 4507 : 127 - +
  • [40] VARIATIONAL INFERENCE FOR PROBABILISTIC POISSON PCA
    Chiquet, Julien
    Mariadassou, Mahendra
    Robin, Stephane
    ANNALS OF APPLIED STATISTICS, 2018, 12 (04): : 2674 - 2698