Synthetic Dataset Generation for Fairer Unfairness Research

被引:0
|
作者
Jiang, Lan [1 ]
Belitz, Clara [1 ]
Bosch, Nigel [1 ,2 ]
机构
[1] Univ Illinois, Sch Informat Sci, Champaign, IL 61820 USA
[2] Univ Illinois, Dept Educ Psychol, Champaign, IL USA
关键词
datasets; student data; fair machine learning; data generation;
D O I
10.1145/3636555.3636868
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research has made strides toward fair machine learning. Relatively few datasets, however, are commonly examined to evaluate these fairness-aware algorithms, and even fewer in education domains, which can lead to a narrow focus on particular types of fairness issues. In this paper, we describe a novel dataset modification method that utilizes a genetic algorithm to induce many types of unfairness into datasets. Additionally, our method can generate an unfairness benchmark dataset from scratch (thus avoiding data collection in situations that might exploit marginalized populations), or modify an existing dataset used as a reference point. Our method can increase the unfairness by 156.3% on average across datasets and unfairness definitions while preserving AUC scores for models trained on the original dataset ( just 0.3% change, on average). We investigate the generalization of our method across educational datasets with different characteristics and evaluate three common unfairness mitigation algorithms. The results show that our method can generate datasets with different types of unfairness, large and small datasets, different types of features, and which affect models trained with different classifiers. Datasets generated with this method can be used for benchmarking and testing for future research on the measurement and mitigation of algorithmic unfairness.
引用
收藏
页码:200 / 209
页数:10
相关论文
共 50 条
  • [1] Synthetic Dataset Generation of Driver Telematics
    So, Banghee
    Boucher, Jean-Philippe
    Valdez, Emiliano A.
    [J]. RISKS, 2021, 9 (04)
  • [2] A Comparative Study of Synthetic Dataset Generation Techniques
    Dandekar, Ashish
    Zen, Remmy A. M.
    Bressan, Stephane
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 387 - 395
  • [3] MedWGAN based synthetic dataset generation for Uveitis pathology
    Sliman, Heithem
    Megdiche, Imen
    Alajramy, Loay
    Taweel, Adel
    Yangui, Sami
    Drira, Aida
    Lamine, Elyes
    [J]. INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 18
  • [4] Synthetic time series dataset generation for unsupervised autoencoders
    Klopries, Hendrik
    Torres, David Orlando Salazar
    Schwung, Andreas
    [J]. 2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [5] A novel synthetic dataset for research in overlapped fingerprint separation
    Stojanovic, Branka
    Marques, Oge
    Neskovic, Aleksandar
    [J]. PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
  • [6] On the synthetic dataset generation for IPTV services based on user behavior
    Abdollahpouri, Alireza
    Qavami, Reyhan
    Moradi, Parham
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (07) : 8475 - 8493
  • [7] Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern
    Sifat, Md Habibur Rahman
    Rahman, Chowdhury Rafeed
    Rafsan, Mohammad
    Rahman, Hasibur
    [J]. 2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 1363 - 1366
  • [8] A Synthetic Dataset Generation for the Uveitis Pathology Based on MedWGAN Model
    Sliman, Heithem
    Megdiche, Imen
    Yangui, Sami
    Drira, Aida
    Drira, Ines
    Lamine, Elyes
    [J]. 38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 559 - 566
  • [9] On the synthetic dataset generation for IPTV services based on user behavior
    Alireza Abdollahpouri
    Reyhan Qavami
    Parham Moradi
    [J]. Multimedia Tools and Applications, 2018, 77 : 8475 - 8493
  • [10] Synthetic Dataset Generation for Text Recognition with Generative Adversarial Networks
    Efimova, Valeria
    Shalamov, Viacheslav
    Filchenkov, Andrey
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433