Tackling Documentation Debt: A Survey on Algorithmic Fairness Datasets

被引：8

作者：

Fabris, Alessandro ^{[1
]}

Messina, Stefano ^{[1
]}

Silvello, Gianmaria ^{[1
]}

Susto, Gian Antonio ^{[1
]}

机构：

[1] Univ Padua, Padua, Italy

来源：

ACM CONFERENCE ON EQUITY AND ACCESS IN ALGORITHMS, MECHANISMS, AND OPTIMIZATION, EAAMO 2022 | 2022年

关键词：

Algorithmic fairness; Data studies; Documentation debt;

D O I：

10.1145/3551624.3555286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A growing community of researchers has been investigating the equity of algorithms, advancing the understanding of risks and opportunities of automated decision-making for historically dis-advantaged populations. Progress in fair Machine Learning (ML) hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the research community, as a whole, suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity). In this work, we survey over two hundred datasets employed in algorithmic fairness research, producing standardized and searchable documentation for each of them. Moreover we rigorously identify the three most popular fairness datasets, namely Adult, COMPAS, and German Credit, for which we compile in-depth documentation. This unifying documentation effort targets documentation sparsity and supports multiple contributions. In the first part of this work, we summarize the merits and limitations of Adult, COMPAS, and German Credit, adding to and unifying recent scholarship, calling into question their suitability as general-purpose fairness benchmarks. To overcome this limitation, we document hundreds of available alternatives, annotating their domain and the algorithmic fairness tasks they support, along with additional properties of interest for fairness practitioners and researchers, including their format, cardinality, and the sensitive attributes they encode. In the second part, we summarize this information, zooming in on the domains and tasks supported by these resources. Overall, we assemble and summarize sparse information on hundreds of datasets into a single resource, which we make available to the community, with the aim of tackling the data documentation debt.

引用

页数：13

共 50 条

[1] Algorithmic fairness datasets: the story so far
Fabris, Alessandro
Messina, Stefano
Silvello, Gianmaria
Susto, Gian Antonio
DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (06) : 2074 - 2152
[2] Algorithmic fairness datasets: the story so far
Alessandro Fabris
Stefano Messina
Gianmaria Silvello
Gian Antonio Susto
Data Mining and Knowledge Discovery, 2022, 36 : 2074 - 2152
[3] Fairness and Bias in Algorithmic Hiring: A Multidisciplinary Survey
Fabris, Alessandro
Wska, Nina barano
Dennis, Matthew j.
Graus, David
Hacker, Philipp
Saldi Var, Jorge
Borgesius, Frederik zuiderveen
Biega, Asia j.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 16 (01)
[4] Supervised Algorithmic Fairness in Distribution Shifts: A Survey
Shao, Minglai
Li, Dong
Zhao, Chen
Wu, Xintao
Lin, Yujie
Tian, Qin
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 8225 - 8233
[5] The Fairness in Algorithmic Fairness
Sune Holm
Res Publica, 2023, 29 : 265 - 281
[6] The Fairness in Algorithmic Fairness
Holm, Sune
RES PUBLICA-A JOURNAL OF MORAL LEGAL AND POLITICAL PHILOSOPHY, 2023, 29 (02): : 265 - 281
[7] A survey on datasets for fairness-aware machine learning
Tai Le Quy
Roy, Arjun
Iosifidis, Vasileios
Zhang, Wenbin
Ntoutsi, Eirini
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (03)
[8] Algorithmic Fairness and Fairness Computing
Fan Z.
Meng X.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2048 - 2066
[9] Algorithmic Fairness
Kleinberg, Jon
Ludwig, Jens
Mullainathan, Sendhil
Rambachan, Ashesh
AEA PAPERS AND PROCEEDINGS, 2018, 108 : 22 - 27
[10] Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations
Alderman, Joseph E.
Palmer, Joanne
Laws, Elinor
McCradden, Melissa
Ordish, Johan
Ghassemi, Marzyeh
Pfohl, Stephen R.
Rostamzadeh, Negar
Cole-Lewis, Heather
Glocker, Ben
Calvert, Melanie
Pollard, Tom J.
Gill, Jaspret
Gath, Jacqui
Adebajo, Adewale
Beng, Jude
Leung, Cassandra H.
Kuku, Stephanie
Farmer, Lesley-Anne
Matin, Rubeta N.
Mateen, Bilal A.
McKay, Francis
Heller, Katherine
Karthikesalingam, Alan
Treanor, Darren
Mackintosh, Maxine
Oakden-Rayner, Lauren
Pearson, Russell
Manrai, Arjun K.
Myles, Puja
Kumuthini, Judit
Kapacee, Zoher
Sebire, Neil J.
Nazer, Lama H.
Seah, Jarrel
Akbari, Ashley
Berman, Lew
Gichoya, Judy W.
Righetto, Lorenzo
Samuel, Diana
Wasswa, William
Charalambides, Maria
Arora, Anmol
Pujari, Sameer
Summers, Charlotte
Sapey, Elizabeth
Wilkinson, Sharon
Thakker, Vishal
Denniston, Alastair
Liu, Xiaoxuan
LANCET DIGITAL HEALTH, 2025, 7 (01): : e64 - e88

← 1 2 3 4 5 →