Tabular data synthesis with generative adversarial networks: design space and optimizations

被引:6
|
作者
Liu, Tongyu [1 ]
Fan, Ju [1 ]
Li, Guoliang [2 ]
Tang, Nan [3 ]
Du, Xiaoyong [1 ]
机构
[1] Renmin Univ China, Beijing 100872, Peoples R China
[2] Tsinghua Univ, Beijing 100084, Peoples R China
[3] HKUST GZ, Guangzhou 511455, Peoples R China
来源
VLDB JOURNAL | 2024年 / 33卷 / 02期
关键词
Tabular data synthesis; Generative adversarial networks; GAN optimizations; Data privacy; PRIVACY;
D O I
10.1007/s00778-023-00807-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To address this problem, the database community and machine learning community have recently studied a new problem of tabular data synthesis using generative adversarial networks (GANs) and proposed various algorithms. However, a comprehensive comparison between GAN-based methods and conventional approaches is still lacking, making it unclear why and how GANs can outperform conventional approaches in synthesizing tabular data. Moreover, it is difficult for practitioners to understand which components are necessary when building a GAN model for tabular data synthesis. To bridge this gap, we conduct a comprehensive experimental study that investigates applying GAN to tabular data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We provide optimization techniques to handle difficulties in training GAN in practice. We conduct extensive experiments to explore the design space, comparing with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for tabular data synthesis and provide guidance for selecting appropriate design choices. We also point out limitations of GAN and identify future research directions. We make all code and datasets public for future research.
引用
收藏
页码:255 / 280
页数:26
相关论文
共 50 条
  • [41] Time-series Anonymization of Tabular Health Data using Generative Adversarial Network
    Hashemi, Atiye Sadat
    Etminani, Kobra
    Soliman, Amira
    Hamed, Omar
    Lundstrom, Jens
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [42] Generative Design of Outdoor Green Spaces Based on Generative Adversarial Networks
    Chen, Ran
    Zhao, Jing
    Yao, Xueqi
    Jiang, Sijia
    He, Yingting
    Bao, Bei
    Luo, Xiaomin
    Xu, Shuhan
    Wang, Chenxi
    BUILDINGS, 2023, 13 (04)
  • [43] Generating Adversarial Examples through Latent Space Exploration of Generative Adversarial Networks
    Clare, Luana
    Correia, Joao
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1760 - 1767
  • [44] Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy
    Kang, Ha Ye Jin
    Batbaatar, Erdenebileg
    Choi, Dong-Woo
    Choi, Kui Son
    Ko, Minsam
    Ryu, Kwang Sun
    JMIR MEDICAL INFORMATICS, 2023, 11
  • [45] Modular Robot Design Optimization with Generative Adversarial Networks
    Hu, Jiaheng
    Whitman, Julian
    Travers, Matthew
    Choset, Howie
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 4282 - 4288
  • [46] Generative Adversarial Networks for De Novo Molecular Design
    Lee, Young Jae
    Kahng, Hyungu
    Kim, Seoung Bum
    MOLECULAR INFORMATICS, 2021, 40 (10)
  • [47] Multiphysics Design Optimization via Generative Adversarial Networks
    Kazemi, Hesaneh
    Seepersad, Carolyn C.
    Kim, H. Alicia
    JOURNAL OF MECHANICAL DESIGN, 2022, 144 (12)
  • [48] Sequential Data Imputation with Evolving Generative Adversarial Networks
    Chakraborty, Haripriya
    Samanta, Priyanka
    Zhao, Liang
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [49] Improved generative adversarial imputation networks for missing data
    Qin, Xiwen
    Shi, Hongyu
    Dong, Xiaogang
    Zhang, Siqi
    Yuan, Liping
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
  • [50] Mixed Data Imputation Using Generative Adversarial Networks
    Khan, Wasif
    Zaki, Nazar
    Ahmad, Amir
    Masud, Mohammad Mehedy
    Ali, Luqman
    Ali, Nasloon
    Ahmed, Luai A.
    IEEE ACCESS, 2022, 10 : 124475 - 124490