Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning Analytics

被引:1
|
作者
Liu, Qinyi [1 ]
Khalil, Mohammad [1 ]
Shakya, Ronas [1 ]
Jovanovic, Jelena [1 ,2 ]
机构
[1] Univ Bergen, Ctr Sci Learning & Technol SLATE, Bergen, Norway
[2] Univ Belgrade, Fac Org Sci, Belgrade, Serbia
关键词
Learning analytics; Synthetic data generation; Generative adversarial; network; Privacy Preserving Technologies; IDENTIFICATION; INSUFFICIENT;
D O I
10.1145/3636555.3636921
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonymization and data misuse that current solutions struggle to address. Synthetic data emerges as a potential remedy, offering robust privacy protection. However, prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility. Synthetic data must not only enhance privacy but also remain practical for data analytics. Moreover, diverse LA scenarios come with varying privacy and utility needs, making the selection of an appropriate synthetic data approach a pressing challenge. To address these gaps, we propose a comprehensive evaluation of synthetic data, which encompasses three dimensions of synthetic data quality, namely resemblance, utility, and privacy. We apply this evaluation to three distinct LA datasets, using three different synthetic data generation methods. Our results show that synthetic data can maintain similar utility (i.e., predictive performance) as real data, while preserving privacy. Furthermore, considering different privacy and data utility requirements in different LA scenarios, we make customized recommendations for synthetic data generation. This paper not only presents a comprehensive evaluation of synthetic data but also illustrates its potential in mitigating privacy concerns within the field of LA, thus contributing to a wider application of synthetic data in LA and promoting a better practice for open science.
引用
收藏
页码:620 / 631
页数:12
相关论文
共 50 条
  • [31] Preserving privacy through data generation
    Vreeken, Jilles
    van Leeuwen, Matthijs
    Siebes, Arno
    [J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 685 - 690
  • [32] Privacy-Preserving Medical Data Generation Using Adversarial Learning
    Das, Pronaya Prosun
    Tawadros, Despina
    Wiese, Lena
    [J]. INFORMATION SECURITY, ISC 2023, 2023, 14411 : 24 - 41
  • [33] A comprehensive review on privacy preserving data mining
    Aldeen, Yousra Abdul Alsahib S.
    Salleh, Mazleena
    Razzaque, Mohammad Abdur
    [J]. SPRINGERPLUS, 2015, 4 : 1 - 36
  • [34] Generating Privacy Preserving Synthetic Medical Data
    Faisal, Fahim
    Mohammed, Noman
    Leung, Carson K.
    Wang, Yang
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 1003 - 1012
  • [35] Deep learning-based privacy-preserving framework for synthetic trajectory generation
    Kim, Jong Wook
    Jang, Beakcheol
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2022, 206
  • [36] Synthetic data generation for tabular health records: A review
    Hernandez, Mikel
    Epelde, Gorka
    Alberdi, Ane
    Cilla, Rodrigo
    Rankin, Debbie
    [J]. NEUROCOMPUTING, 2022, 493 : 28 - 45
  • [37] Federated TimeGAN for Privacy Preserving Synthetic Trajectory Generation
    Bouabba, Saloua
    Zeitouni, Karine
    Haidar, Bassem
    Agoulmine, Nazim
    Dagdia, Zaineb Chelly
    [J]. PROCEEDINGS OF THE 2024 25TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT, MDM 2024, 2024, : 301 - 306
  • [38] A Case Study on Privacy Threats and Research Challenges in Privacy Preserving Data Analytics
    Rao, P. Ram Mohan
    Krishna, S. Murali
    Kumar, A. P. Siva
    [J]. 2017 INTERNATIONAL CONFERENCE OF ELECTRONICS, COMMUNICATION AND AEROSPACE TECHNOLOGY (ICECA), VOL 2, 2017, : 185 - 188
  • [39] GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy
    Coutinho-Almeida, Joao
    Rodrigues, Pedro Pereira
    Cruz-Correia, Ricardo Joao
    [J]. DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 282 - 291
  • [40] Exploring Innovative Approaches to Synthetic Tabular Data Generation
    Papadaki, Eugenia
    Vrahatis, Aristidis G.
    Kotsiantis, Sotiris
    [J]. ELECTRONICS, 2024, 13 (10)