A survey on datasets for fairness-aware machine learning

被引:69
|
作者
Tai Le Quy [1 ]
Roy, Arjun [1 ,2 ]
Iosifidis, Vasileios [1 ]
Zhang, Wenbin [3 ]
Ntoutsi, Eirini [2 ]
机构
[1] Leibniz Univ Hannover, L3S Res Ctr, Hannover, Germany
[2] Free Univ Berlin, Inst Comp Sci, Berlin, Germany
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
benchmark datasets; bias; datasets for fairness; discrimination; fairness-aware machine learning; DISPARATE IMPACT; DISCRIMINATION; SURVIVAL;
D O I
10.1002/widm.1452
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing
引用
收藏
页数:59
相关论文
共 50 条
  • [1] Fairness-aware Configuration of Machine Learning Libraries
    Tizpaz-Niari, Saeid
    Kumar, Ashish
    Tan, Gang
    Trivedi, Ashutosh
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 909 - 920
  • [2] Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned
    Bird, Sarah
    Kenthapadi, Krishnaram
    Kiciman, Emre
    Mitchell, Margaret
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 834 - 835
  • [3] Fairness-aware machine learning engineering: how far are we?
    Carmine Ferrara
    Giulia Sellitto
    Filomena Ferrucci
    Fabio Palomba
    Andrea De Lucia
    [J]. Empirical Software Engineering, 2024, 29
  • [4] Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned
    Bird, Sarah
    Hutchinson, Ben
    Kenthapadi, Krishnaram
    Kiciman, Emre
    Mitchell, Margaret
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3205 - 3206
  • [5] Fairness-aware machine learning engineering: how far are we?
    Ferrara, Carmine
    Sellitto, Giulia
    Ferrucci, Filomena
    Palomba, Fabio
    De Lucia, Andrea
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (01)
  • [6] Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned
    Bird, Sarah
    Hutchinson, Ben
    Kenthapadi, Krishnaram
    Kiciman, Emre
    Mitchell, Margaret
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1297 - 1298
  • [7] A survey on fairness-aware recommender systems
    Jin, Di
    Wang, Luzhi
    Zhang, He
    Zheng, Yizhen
    Ding, Weiping
    Xia, Feng
    Pan, Shirui
    [J]. INFORMATION FUSION, 2023, 100
  • [8] Towards Fairness-Aware Federated Learning
    Shi, Yuxin
    Yu, Han
    Leung, Cyril
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 11922 - 11938
  • [9] Fairness-aware Class Imbalanced Learning
    Subramanian, Shivashankar
    Rahimi, Afshin
    Baldwin, Timothy
    Cohn, Trevor
    Frermann, Lea
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2045 - 2051
  • [10] FAIM: Fairness-aware interpretable modeling for trustworthy machine learning in healthcare
    Liu, Mingxuan
    Ning, Yilin
    Ke, Yuhe
    Shang, Yuqing
    Chakraborty, Bibhas
    Ong, Marcus Eng Hock
    Vaughan, Roger
    Liu, Nan
    [J]. Patterns, 2024, 5 (10):