Tabular data synthesis with generative adversarial networks: design space and optimizations

被引:6
|
作者
Liu, Tongyu [1 ]
Fan, Ju [1 ]
Li, Guoliang [2 ]
Tang, Nan [3 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
[3] HKUST GZ, Guangzhou 511455, Peoples R China
来源
VLDB JOURNAL | 2024年 / 33卷 / 02期
关键词
Tabular data synthesis; Generative adversarial networks; GAN optimizations; Data privacy; PRIVACY;
D O I
10.1007/s00778-023-00807-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To address this problem, the database community and machine learning community have recently studied a new problem of tabular data synthesis using generative adversarial networks (GANs) and proposed various algorithms. However, a comprehensive comparison between GAN-based methods and conventional approaches is still lacking, making it unclear why and how GANs can outperform conventional approaches in synthesizing tabular data. Moreover, it is difficult for practitioners to understand which components are necessary when building a GAN model for tabular data synthesis. To bridge this gap, we conduct a comprehensive experimental study that investigates applying GAN to tabular data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We provide optimization techniques to handle difficulties in training GAN in practice. We conduct extensive experiments to explore the design space, comparing with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for tabular data synthesis and provide guidance for selecting appropriate design choices. We also point out limitations of GAN and identify future research directions. We make all code and datasets public for future research.
引用
收藏
页码:255 / 280
页数:26
相关论文
共 50 条
  • [31] Training Generative Adversarial Networks with Limited Data
    Karras, Tero
    Aittala, Miika
    Hellsten, Janne
    Laine, Samuli
    Lehtinen, Jaakko
    Aila, Timo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [32] Data Augmentation with Improved Generative Adversarial Networks
    Shi, Hongjiang
    Wang, Lu
    Ding, Guangtai
    Yang, Fenglei
    Li, Xiaoqiang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 73 - 78
  • [33] Data Augmentation Powered by Generative Adversarial Networks
    Poka, Karoly Bence
    Szemenyei, Marton
    2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [34] Evolutionary Latent Space Exploration of Generative Adversarial Networks
    Fernandes, Paulo
    Correia, Joao
    Machado, Penousal
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2020, 2020, 12104 : 595 - 609
  • [35] ClusterGAN: Latent Space Clustering in Generative Adversarial Networks
    Mukherjee, Sudipto
    Asnani, Himanshu
    Lin, Eugene
    Kannan, Sreeram
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4610 - 4617
  • [36] Photoacoustic image synthesis with generative adversarial networks
    Schellenberg, Melanie
    Groehl, Janek
    Dreher, Kris K.
    Noelke, Jan-Hinrich
    Holzwarth, Niklas
    Tizabi, Minu D.
    Seitel, Alexander
    Maier-Hein, Lena
    PHOTOACOUSTICS, 2022, 28
  • [37] Historical Document Synthesis With Generative Adversarial Networks
    Pondenkandath, Vinaychandran
    Alberti, Michele
    Diatta, Michael
    Ingold, Rolf
    Liwicki, Marcus
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 146 - 151
  • [38] Skin Lesion Synthesis with Generative Adversarial Networks
    Bissoto, Alceu
    Perez, Fabio
    Valle, Eduardo
    Avila, Sandra
    OR 2.0 CONTEXT-AWARE OPERATING THEATERS, COMPUTER ASSISTED ROBOTIC ENDOSCOPY, CLINICAL IMAGE-BASED PROCEDURES, AND SKIN IMAGE ANALYSIS, OR 2.0 2018, 2018, 11041 : 294 - 302
  • [39] Generative Adversarial Networks with Data Augmentation and Multiple Penalty Areas for Image Synthesis
    Chen, Li
    Chan, Huah Yong
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (03) : 428 - 434
  • [40] Medical Image Synthesis for Data Augmentation and Anonymization Using Generative Adversarial Networks
    Shin, Hoo-Chang
    Tenenholtz, Neil A.
    Rogers, Jameson K.
    Schwarz, Christopher G.
    Senjem, Matthew L.
    Gunter, Jeffrey L.
    Andriole, Katherine P.
    Michalski, Mark
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, 2018, 11037 : 1 - 11