Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

被引:3
|
作者
Nik, Alireza Hossein Zadeh [1 ,2 ]
Riegler, Michael A. [1 ,3 ]
Halvorsen, Pal [1 ,4 ]
Storas, Andrea M. [1 ,4 ]
机构
[1] SimulaMet, Oslo, Norway
[2] Univ Stavanger, Stavanger, Norway
[3] Univ Tromso, Tromso, Norway
[4] OsloMet, Oslo, Norway
来源
关键词
Synthetic data generation; Deep learning; Medical data;
D O I
10.1007/978-3-031-27077-2_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws complicates the access to medical datasets. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare institutions still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as generative adversarial networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. This paper examines the GANs' potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations are examined. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients' privacy. The results indicate that the proposed models can generate synthetic datasets that maintain the statistical characteristics, model compatibility and privacy of the original data. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data.
引用
收藏
页码:434 / 446
页数:13
相关论文
共 50 条
  • [1] TabFairGAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2022, 4 (02): : 488 - 501
  • [2] Generation of Realistic Synthetic Validation Healthcare Datasets Using Generative Adversarial Networks
    Ozyigit, Eda Bilici
    Arvanitis, Theodoros N.
    Despotou, George
    [J]. IMPORTANCE OF HEALTH INFORMATICS IN PUBLIC HEALTH DURING A PANDEMIC, 2020, 272 : 322 - 325
  • [3] Generation of Synthetic Data with Conditional Generative Adversarial Networks
    Vega-Marquez, Belen
    Rubio-Escudero, Cristina
    Nepomuceno-Chamorro, Isabel
    [J]. LOGIC JOURNAL OF THE IGPL, 2022, 30 (02) : 252 - 262
  • [4] Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy
    Kang, Ha Ye Jin
    Batbaatar, Erdenebileg
    Choi, Dong-Woo
    Choi, Kui Son
    Ko, Minsam
    Ryu, Kwang Sun
    [J]. JMIR MEDICAL INFORMATICS, 2023, 11
  • [5] Distance Correlation GAN: Fair Tabular Data Generation with Generative Adversarial Networks
    Rajabi, Amirarsalan
    Garibay, Ozlem Ozmen
    [J]. ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2023, PT I, 2023, 14050 : 431 - 445
  • [6] Generating Realistic Synthetic Traffic Data using Conditional Tabular Generative Adversarial Networks for Intelligent Transportation Systems
    Nigam, Archana
    Srivastava, Sanjay
    [J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2881 - 2886
  • [7] Synthetic Fingerprint Generation Using Generative Adversarial Networks: A Review
    Dhaneshwar, Ritika
    Taya, Arnav
    Kaur, Mandeep
    [J]. FOURTH CONGRESS ON INTELLIGENT SYSTEMS, VOL 1, CIS 2023, 2024, 868 : 375 - 387
  • [8] Synthetic Behavior Sequence Generation Using Generative Adversarial Networks
    Akbari F.
    Sartipi K.
    Archer N.
    [J]. ACM Transactions on Computing for Healthcare, 2023, 4 (01):
  • [9] Generation of synthetic full-scale burst test data for corroded pipelines using the tabular generative adversarial network
    He, Z.
    Zhou, W.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
  • [10] Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation
    Lázaro, Carla
    Angulo, Cecilio
    [J]. Sensors, 2024, 24 (23)