Tabular data synthesis with generative adversarial networks: design space and optimizations

被引:6
|
作者
Liu, Tongyu [1 ]
Fan, Ju [1 ]
Li, Guoliang [2 ]
Tang, Nan [3 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
[3] HKUST GZ, Guangzhou 511455, Peoples R China
来源
VLDB JOURNAL | 2024年 / 33卷 / 02期
关键词
Tabular data synthesis; Generative adversarial networks; GAN optimizations; Data privacy; PRIVACY;
D O I
10.1007/s00778-023-00807-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To address this problem, the database community and machine learning community have recently studied a new problem of tabular data synthesis using generative adversarial networks (GANs) and proposed various algorithms. However, a comprehensive comparison between GAN-based methods and conventional approaches is still lacking, making it unclear why and how GANs can outperform conventional approaches in synthesizing tabular data. Moreover, it is difficult for practitioners to understand which components are necessary when building a GAN model for tabular data synthesis. To bridge this gap, we conduct a comprehensive experimental study that investigates applying GAN to tabular data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We provide optimization techniques to handle difficulties in training GAN in practice. We conduct extensive experiments to explore the design space, comparing with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for tabular data synthesis and provide guidance for selecting appropriate design choices. We also point out limitations of GAN and identify future research directions. We make all code and datasets public for future research.
引用
收藏
页码:255 / 280
页数:26
相关论文
共 50 条
  • [21] Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability
    Gomez-Martinez, Vanesa
    Chushig-Muzo, David
    Veierod, Marit B.
    Granja, Conceicao
    Soguero-Ruiz, Cristina
    BIODATA MINING, 2024, 17 (01):
  • [22] Generating Realistic Synthetic Traffic Data using Conditional Tabular Generative Adversarial Networks for Intelligent Transportation Systems
    Nigam, Archana
    Srivastava, Sanjay
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2881 - 2886
  • [23] Range-Constrained Generative Adversarial Network: Design Synthesis Under Constraints Using Conditional Generative Adversarial Networks
    Nobari, Amin Heyrani
    Chen, Wei
    Ahmed, Faez
    JOURNAL OF MECHANICAL DESIGN, 2022, 144 (02)
  • [24] Attribute-Aware Generative Design With Generative Adversarial Networks
    Yuan, Chenxi
    Moghaddam, Mohsen
    IEEE ACCESS, 2020, 8 : 190710 - 190721
  • [25] Improving Generative Adversarial Networks via Adversarial Learning in Latent Space
    Li, Yang
    Mo, Yichuan
    Shi, Liangliang
    Yan, Junchi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] Generative adversarial networks for the design of acoustic metamaterialsa)
    Gurbuz, Caglar
    Kronowetter, Felix
    Dietz, Christoph
    Eser, Martin
    Schmid, Jonas
    Marburg, Steffen
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (02): : 1162 - 1174
  • [27] TOPOLOGY DESIGN WITH CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS
    Sharpe, Conner
    Seepersad, Carolyn Conner
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2019, VOL 2A, 2020,
  • [28] Generative Adversarial Networks for spot weld design
    Gerlach, Tobias
    Eggink, Derk H. D.
    2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [29] Generative Adversarial Networks for Bitcoin Data Augmentation
    Zola, Francesco
    Lukas Bruse, Jan
    Etxeberria Barrio, Xabier
    Galar, Mikel
    Orduna Urrutia, Raul
    2020 2ND CONFERENCE ON BLOCKCHAIN RESEARCH & APPLICATIONS FOR INNOVATIVE NETWORKS AND SERVICES (BRAINS), 2020, : 136 - 143
  • [30] Augmenting data with generative adversarial networks: An overview
    Ljubic, Hrvoje
    Martinovic, Goran
    Volaric, Tomislav
    INTELLIGENT DATA ANALYSIS, 2022, 26 (02) : 361 - 378