Synthetic Dataset Generation for Fairer Unfairness Research

被引:0
|
作者
Jiang, Lan [1 ]
Belitz, Clara [1 ]
Bosch, Nigel [1 ,2 ]
机构
[1] Univ Illinois, Sch Informat Sci, Champaign, IL 61820 USA
[2] Univ Illinois, Dept Educ Psychol, Champaign, IL USA
关键词
datasets; student data; fair machine learning; data generation;
D O I
10.1145/3636555.3636868
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research has made strides toward fair machine learning. Relatively few datasets, however, are commonly examined to evaluate these fairness-aware algorithms, and even fewer in education domains, which can lead to a narrow focus on particular types of fairness issues. In this paper, we describe a novel dataset modification method that utilizes a genetic algorithm to induce many types of unfairness into datasets. Additionally, our method can generate an unfairness benchmark dataset from scratch (thus avoiding data collection in situations that might exploit marginalized populations), or modify an existing dataset used as a reference point. Our method can increase the unfairness by 156.3% on average across datasets and unfairness definitions while preserving AUC scores for models trained on the original dataset ( just 0.3% change, on average). We investigate the generalization of our method across educational datasets with different characteristics and evaluate three common unfairness mitigation algorithms. The results show that our method can generate datasets with different types of unfairness, large and small datasets, different types of features, and which affect models trained with different classifiers. Datasets generated with this method can be used for benchmarking and testing for future research on the measurement and mitigation of algorithmic unfairness.
引用
收藏
页码:200 / 209
页数:10
相关论文
共 50 条
  • [21] Generation of a global synthetic tropical cyclone hazard dataset using STORM
    Nadia Bloemendaal
    Ivan D. Haigh
    Hans de Moel
    Sanne Muis
    Reindert J. Haarsma
    Jeroen C. J. H. Aerts
    [J]. Scientific Data, 7
  • [22] Bias reduction via cooperative bargaining in synthetic graph dataset generation
    Wassington, Axel
    Abadal, Sergi
    [J]. Applied Intelligence, 2025, 55 (02)
  • [23] Automatic Generation of Point Cloud Synthetic Dataset for Historical Building Representation
    Pierdicca, Roberto
    Mameli, Marco
    Malinverni, Eva Savina
    Paolanti, Marina
    Frontoni, Emanuele
    [J]. AUGMENTED REALITY, VIRTUAL REALITY, AND COMPUTER GRAPHICS, PT I, 2019, 11613 : 203 - 219
  • [24] Bias and Unfairness of Collaborative Filtering Based Recommender Systems in MovieLens Dataset
    Gonzalez, Alvaro
    Ortega, Fernando
    Perez-Lopez, Diego
    Alonso, Santiago
    [J]. IEEE ACCESS, 2022, 10 : 68429 - 68439
  • [25] MAVERICK: A synthetic murder mystery network dataset to support sensemaking research
    Jenkins, Michael P.
    Bisantz, Ann M.
    Nagi, Rakesh
    Llinas, James
    [J]. 6TH INTERNATIONAL CONFERENCE ON APPLIED HUMAN FACTORS AND ERGONOMICS (AHFE 2015) AND THE AFFILIATED CONFERENCES, AHFE 2015, 2015, 3 : 5036 - 5043
  • [26] Synthetic dataset generation for object-to-model deep learning in industrial applications
    Wong, Matthew Z.
    Kunii, Kiyohito
    Baylis, Max
    Ong, Wai Hong
    Kroupa, Pavel
    Koller, Swen
    [J]. PEERJ COMPUTER SCIENCE, 2019, 2019 (10)
  • [27] Veiling glare removal: synthetic dataset generation, metrics and neural network architecture
    Shoshin, A., V
    Shvets, E. A.
    [J]. COMPUTER OPTICS, 2021, 45 (04) : 615 - 626
  • [28] Aerial and Ground Vehicles Synthetic SAR Dataset Generation for Automatic Target Recognition
    Ahmadibeni, Ali
    Borooshak, Leila
    Jones, Branddon
    Shirkhodaie, Amir
    [J]. ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXVII, 2020, 11393
  • [29] Automated Generation of Synthetic in-Car Dataset for Human Body Pose Detection
    Borges, Joao
    Oliveira, Bruno
    Torres, Helena
    Rodrigues, Nelson
    Queiros, Sandro
    Shiller, Maximilian
    Coelho, Victor
    Pallauf, Johannes
    Brito, Jose Henrique
    Mendes, Jose
    Fonseca, Jaime C.
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 550 - 557
  • [30] ITF-GAN: Synthetic time series dataset generation and manipulation by features
    Klopries, Hendrik
    Schwung, Andreas
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 283