Tackling Documentation Debt: A Survey on Algorithmic Fairness Datasets

被引:8
|
作者
Fabris, Alessandro [1 ]
Messina, Stefano [1 ]
Silvello, Gianmaria [1 ]
Susto, Gian Antonio [1 ]
机构
[1] Univ Padua, Padua, Italy
来源
ACM CONFERENCE ON EQUITY AND ACCESS IN ALGORITHMS, MECHANISMS, AND OPTIMIZATION, EAAMO 2022 | 2022年
关键词
Algorithmic fairness; Data studies; Documentation debt;
D O I
10.1145/3551624.3555286
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A growing community of researchers has been investigating the equity of algorithms, advancing the understanding of risks and opportunities of automated decision-making for historically dis-advantaged populations. Progress in fair Machine Learning (ML) hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the research community, as a whole, suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity). In this work, we survey over two hundred datasets employed in algorithmic fairness research, producing standardized and searchable documentation for each of them. Moreover we rigorously identify the three most popular fairness datasets, namely Adult, COMPAS, and German Credit, for which we compile in-depth documentation. This unifying documentation effort targets documentation sparsity and supports multiple contributions. In the first part of this work, we summarize the merits and limitations of Adult, COMPAS, and German Credit, adding to and unifying recent scholarship, calling into question their suitability as general-purpose fairness benchmarks. To overcome this limitation, we document hundreds of available alternatives, annotating their domain and the algorithmic fairness tasks they support, along with additional properties of interest for fairness practitioners and researchers, including their format, cardinality, and the sensitive attributes they encode. In the second part, we summarize this information, zooming in on the domains and tasks supported by these resources. Overall, we assemble and summarize sparse information on hundreds of datasets into a single resource, which we make available to the community, with the aim of tackling the data documentation debt.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Algorithmic fairness datasets: the story so far
    Fabris, Alessandro
    Messina, Stefano
    Silvello, Gianmaria
    Susto, Gian Antonio
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (06) : 2074 - 2152
  • [2] Algorithmic fairness datasets: the story so far
    Alessandro Fabris
    Stefano Messina
    Gianmaria Silvello
    Gian Antonio Susto
    Data Mining and Knowledge Discovery, 2022, 36 : 2074 - 2152
  • [3] Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey
    Fabris, Alessandro
    Wska, Nina barano
    Dennis, Matthew j.
    Graus, David
    Hacker, Philipp
    Saldi Var, Jorge
    Borgesius, Frederik zuiderveen
    Biega, Asia j.
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 16 (01)
  • [4] Supervised Algorithmic Fairness in Distribution Shifts: A Survey
    Shao, Minglai
    Li, Dong
    Zhao, Chen
    Wu, Xintao
    Lin, Yujie
    Tian, Qin
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8225 - 8233
  • [5] The Fairness in Algorithmic Fairness
    Sune Holm
    Res Publica, 2023, 29 : 265 - 281
  • [6] The Fairness in Algorithmic Fairness
    Holm, Sune
    RES PUBLICA-A JOURNAL OF MORAL LEGAL AND POLITICAL PHILOSOPHY, 2023, 29 (02): : 265 - 281
  • [7] A survey on datasets for fairness-aware machine learning
    Tai Le Quy
    Roy, Arjun
    Iosifidis, Vasileios
    Zhang, Wenbin
    Ntoutsi, Eirini
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (03)
  • [8] Algorithmic Fairness and Fairness Computing
    Fan Z.
    Meng X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2048 - 2066
  • [9] Algorithmic Fairness
    Kleinberg, Jon
    Ludwig, Jens
    Mullainathan, Sendhil
    Rambachan, Ashesh
    AEA PAPERS AND PROCEEDINGS, 2018, 108 : 22 - 27
  • [10] Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations
    Alderman, Joseph E.
    Palmer, Joanne
    Laws, Elinor
    McCradden, Melissa
    Ordish, Johan
    Ghassemi, Marzyeh
    Pfohl, Stephen R.
    Rostamzadeh, Negar
    Cole-Lewis, Heather
    Glocker, Ben
    Calvert, Melanie
    Pollard, Tom J.
    Gill, Jaspret
    Gath, Jacqui
    Adebajo, Adewale
    Beng, Jude
    Leung, Cassandra H.
    Kuku, Stephanie
    Farmer, Lesley-Anne
    Matin, Rubeta N.
    Mateen, Bilal A.
    McKay, Francis
    Heller, Katherine
    Karthikesalingam, Alan
    Treanor, Darren
    Mackintosh, Maxine
    Oakden-Rayner, Lauren
    Pearson, Russell
    Manrai, Arjun K.
    Myles, Puja
    Kumuthini, Judit
    Kapacee, Zoher
    Sebire, Neil J.
    Nazer, Lama H.
    Seah, Jarrel
    Akbari, Ashley
    Berman, Lew
    Gichoya, Judy W.
    Righetto, Lorenzo
    Samuel, Diana
    Wasswa, William
    Charalambides, Maria
    Arora, Anmol
    Pujari, Sameer
    Summers, Charlotte
    Sapey, Elizabeth
    Wilkinson, Sharon
    Thakker, Vishal
    Denniston, Alastair
    Liu, Xiaoxuan
    LANCET DIGITAL HEALTH, 2025, 7 (01): : e64 - e88