Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning Analytics

被引:1
|
作者
Liu, Qinyi [1 ]
Khalil, Mohammad [1 ]
Shakya, Ronas [1 ]
Jovanovic, Jelena [1 ,2 ]
机构
[1] Univ Bergen, Ctr Sci Learning & Technol SLATE, Bergen, Norway
[2] Univ Belgrade, Fac Org Sci, Belgrade, Serbia
关键词
Learning analytics; Synthetic data generation; Generative adversarial; network; Privacy Preserving Technologies; IDENTIFICATION; INSUFFICIENT;
D O I
10.1145/3636555.3636921
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonymization and data misuse that current solutions struggle to address. Synthetic data emerges as a potential remedy, offering robust privacy protection. However, prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility. Synthetic data must not only enhance privacy but also remain practical for data analytics. Moreover, diverse LA scenarios come with varying privacy and utility needs, making the selection of an appropriate synthetic data approach a pressing challenge. To address these gaps, we propose a comprehensive evaluation of synthetic data, which encompasses three dimensions of synthetic data quality, namely resemblance, utility, and privacy. We apply this evaluation to three distinct LA datasets, using three different synthetic data generation methods. Our results show that synthetic data can maintain similar utility (i.e., predictive performance) as real data, while preserving privacy. Furthermore, considering different privacy and data utility requirements in different LA scenarios, we make customized recommendations for synthetic data generation. This paper not only presents a comprehensive evaluation of synthetic data but also illustrates its potential in mitigating privacy concerns within the field of LA, thus contributing to a wider application of synthetic data in LA and promoting a better practice for open science.
引用
收藏
页码:620 / 631
页数:12
相关论文
共 50 条
  • [1] Generation and evaluation of privacy preserving synthetic health data
    Yale, Andrew
    Dash, Saloni
    Dutta, Ritik
    Guyon, Isabelle
    Pavao, Adrien
    Bennett, Kristin P.
    [J]. NEUROCOMPUTING, 2020, 416 : 244 - 255
  • [2] Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud
    Abdelhameed, Saad A.
    Moussa, Sherin M.
    Khalifa, Mohamed E.
    [J]. COMPUTERS & SECURITY, 2018, 72 : 74 - 95
  • [3] Evaluation of Synthetic Data for Privacy-Preserving Machine Learning
    Hittmeir, Markus
    Ekelhart, Andreas
    Mayer, Rudolf
    [J]. ERCIM NEWS, 2020, (123): : 30 - 31
  • [4] Privacy-preserving big data analytics - A comprehensive survey
    Tran, Hong-Yen
    Hu, Jiankun
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 134 : 207 - 218
  • [5] Preserving Privacy in Data Analytics
    Ghemri, Lila
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS (IWSPA '19), 2019, : 3 - 4
  • [6] Privacy-Preserving Synthetic Educational Data Generation
    Vie, Jill-Jenn
    Rigaux, Tomas
    Minn, Sein
    [J]. EDUCATING FOR A NEW FUTURE: MAKING SENSE OF TECHNOLOGY-ENHANCED LEARNING ADOPTION, EC-TEL 2022, 2022, 13450 : 393 - 406
  • [7] Privacy-Preserving Synthetic Data Generation for Recommendation Systems
    Liu, Fan
    Cheng, Zhiyong
    Chen, Huilin
    Wei, Yinwei
    Nie, Liqiang
    Kankanhalli, Mohan
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1379 - 1389
  • [8] Preserving Privacy While Sharing Data
    Garfinkel, Simson L.
    Bowen, Claire McKay
    [J]. MIT SLOAN MANAGEMENT REVIEW, 2022, 63 (04) : 7 - +
  • [9] An Efficient and Privacy-preserving Similarity Evaluation For Big Data Analytics
    Gheid, Zakaria
    Challal, Yacine
    [J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 281 - 289
  • [10] Privacy Preserving Synthetic Data Release Using Deep Learning
    Abay, Nazmiye Ceren
    Zhou, Yan
    Kantarcioglu, Murat
    Thuraisingham, Bhavani
    Sweeney, Latanya
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 510 - 526