Evaluating Variational Autoencoder as a Private Data Release Mechanism for Tabular Data

被引:10
|
作者
Li, Szu-Chuang [1 ]
Tai, Bo-Chen [1 ]
Huang, Yennun [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
关键词
variational autoencoder; private data release; k-anonymity; k-Level;
D O I
10.1109/PRDC47002.2019.00050
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multi-market businesses can collect data from different business entities and aggregate data from various sources to create value. However, due to the restriction of privacy regulation, it could be illegal to exchange data between business entities of the same parent company, unless the users have opted-in to allow it. Regulations such as the EU's GDPR allows data exchange if data is anonymized appropriately. In this study, we use variational autoencoder as a mechanism to generate synthetic data. The privacy and utility of the generated data sets are measured. And its performance is compared with the performance of the plain autoencoder. The primary findings of this study are 1) variational autoencoder can be an option for data exchange with good accuracy even when the number of latent dimensions is low 2) plain autoencoder still provides better accuracy when the number of hidden nodes is high 3) variational autoencoder, as a generative model, can be given to a data user to generate his version of data that closely mimic the original data set.
引用
收藏
页码:198 / 206
页数:9
相关论文
共 50 条
  • [21] Relational Variational Autoencoder for Link Prediction with Multimedia Data
    Li, Xiaopeng
    She, James
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 93 - 100
  • [22] PRESERVATION OF ANOMALOUS SUBGROUPS ON VARIATIONAL AUTOENCODER TRANSFORMED DATA
    Maina, Samuel C.
    Bryant, Reginald E.
    Ogallo, William O.
    Varshney, Kush R.
    Speakman, Skyler
    Cintas, Celia
    Walcott-Bryant, Aisha
    Samoilescu, Robert-Florian
    Weldemariam, Komminist
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3627 - 3631
  • [23] Differentially Private Normalizing Flows for Synthetic Tabular Data Generation
    Lee, Jaewoo
    Kim, Minjung
    Jeong, Yonghyun
    Ro, Youngmin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7345 - 7353
  • [24] Tabular data
    Naomi Altman
    Martin Krzywinski
    Nature Methods, 2017, 14 (4) : 329 - 330
  • [25] Variational Autoencoder as a Data Augmentation tool for Confocal Microscopy Images
    Pinieiro, Eugenia Sol
    Ramele, Rodrigo
    Gambini, Juliana
    2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 882 - 885
  • [26] Multidimensional degradation data generation method based on variational autoencoder
    Lin, Yanhui
    Li, Chunbo
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2023, 49 (10): : 2617 - 2627
  • [27] A Method for Generating Sea Clutter Data Based on Variational Autoencoder
    Deng, Xingyu
    Hui, Bingwei
    Han, Xing
    Gao, Fei
    Duan, Dawei
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 54 - 61
  • [28] Distributional Learning of Variational AutoEncoder: Application to Synthetic Data Generation
    An, Seunghwan
    Jeon, Jong-June
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Improved Variational Autoencoder Anomaly Detection in Time Series Data
    Yokkampon, Umaporn
    Chumkamon, Sakmongkon
    Mowshowitz, Abbe
    Fujisawa, Ryusuke
    Hayashi, Eiji
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 82 - 87