Tabular data synthesis with generative adversarial networks: design space and optimizations

被引：6

作者：

Liu, Tongyu ^{[1
]}

Fan, Ju ^{[1
]}

Li, Guoliang ^{[2
]}

Tang, Nan ^{[3
]}

Du, Xiaoyong ^{[1
]}

机构：

[1] Renmin Univ China, Beijing 100872, Peoples R China

[2] Tsinghua Univ, Beijing 100084, Peoples R China

[3] HKUST GZ, Guangzhou 511455, Peoples R China

来源：

VLDB JOURNAL | 2024年 / 33卷 / 02期

关键词：

Tabular data synthesis; Generative adversarial networks; GAN optimizations; Data privacy; PRIVACY;

D O I：

10.1007/s00778-023-00807-y

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To address this problem, the database community and machine learning community have recently studied a new problem of tabular data synthesis using generative adversarial networks (GANs) and proposed various algorithms. However, a comprehensive comparison between GAN-based methods and conventional approaches is still lacking, making it unclear why and how GANs can outperform conventional approaches in synthesizing tabular data. Moreover, it is difficult for practitioners to understand which components are necessary when building a GAN model for tabular data synthesis. To bridge this gap, we conduct a comprehensive experimental study that investigates applying GAN to tabular data synthesis. We introduce a unified GAN-based framework and define a space of design solutions for each component in the framework, including neural network architectures and training strategies. We provide optimization techniques to handle difficulties in training GAN in practice. We conduct extensive experiments to explore the design space, comparing with traditional data synthesis approaches. Through extensive experiments, we find that GAN is very promising for tabular data synthesis and provide guidance for selecting appropriate design choices. We also point out limitations of GAN and identify future research directions. We make all code and datasets public for future research.

引用

页码：255 / 280

页数：26

共 50 条

[41] Time-series Anonymization of Tabular Health Data using Generative Adversarial Network
Hashemi, Atiye Sadat
Etminani, Kobra
Soliman, Amira
Hamed, Omar
Lundstrom, Jens
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[42] Generative Design of Outdoor Green Spaces Based on Generative Adversarial Networks
Chen, Ran
Zhao, Jing
Yao, Xueqi
Jiang, Sijia
He, Yingting
Bao, Bei
Luo, Xiaomin
Xu, Shuhan
Wang, Chenxi
BUILDINGS, 2023, 13 (04)
[43] Generating Adversarial Examples through Latent Space Exploration of Generative Adversarial Networks
Clare, Luana
Correia, Joao
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 1760 - 1767
[44] Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy
Kang, Ha Ye Jin
Batbaatar, Erdenebileg
Choi, Dong-Woo
Choi, Kui Son
Ko, Minsam
Ryu, Kwang Sun
JMIR MEDICAL INFORMATICS, 2023, 11
[45] Modular Robot Design Optimization with Generative Adversarial Networks
Hu, Jiaheng
Whitman, Julian
Travers, Matthew
Choset, Howie
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 4282 - 4288
[46] Generative Adversarial Networks for De Novo Molecular Design
Lee, Young Jae
Kahng, Hyungu
Kim, Seoung Bum
MOLECULAR INFORMATICS, 2021, 40 (10)
[47] Multiphysics Design Optimization via Generative Adversarial Networks
Kazemi, Hesaneh
Seepersad, Carolyn C.
Kim, H. Alicia
JOURNAL OF MECHANICAL DESIGN, 2022, 144 (12)
[48] Sequential Data Imputation with Evolving Generative Adversarial Networks
Chakraborty, Haripriya
Samanta, Priyanka
Zhao, Liang
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[49] Improved generative adversarial imputation networks for missing data
Qin, Xiwen
Shi, Hongyu
Dong, Xiaogang
Zhang, Siqi
Yuan, Liping
APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
[50] Mixed Data Imputation Using Generative Adversarial Networks
Khan, Wasif
Zaki, Nazar
Ahmad, Amir
Masud, Mohammad Mehedy
Ali, Luqman
Ali, Nasloon
Ahmed, Luai A.
IEEE ACCESS, 2022, 10 : 124475 - 124490

← 1 2 3 4 5 →