The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness

被引：5

作者：

Vasileva, Mariya, I ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年

关键词：

Bias in machine learning algorithms; fairness; accountability; transparency; representation learning;

D O I：

10.1145/3394486.3411068

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness. The ability to collect and analyze large datasets for problems in many domains brings forward the danger of implicit data bias, which could be harmful. Data, especially big data, is often heterogeneous, generated by different subgroups with their own characteristics and behaviors. Furthermore, data collection strategies vary vastly across domains, and labelling of examples is performed by human annotators, thus causing the labelling process to amplify inherent biases the annotators might harbor. A model learned on biased data may not only lead to unfair and inaccurate predictions, but also significantly disadvantage certain subgroups, and lead to unfairness in downstream learning tasks. There are multiple ways in which discriminatory bias can seep into data: for example, in medical domains, there are many instances in which the data used are skewed toward certain populations-which can have dangerous consequences for the underrepresented communities W. Another example are large-scale datasets widely used in machine learning tasks, like ImageNet and Open Images: [2] shows that these datasets suffer from representation bias, and advocates for the need to incorporate geo-diversity and inclusion. Yet another example are the popular face recognition and generation datasets like CelebA and Flickr-Faces-HQ, where the ethnic and racial breakdown of example faces shows significant representation bias, evident in downstream tasks like face reconstruction from an obfuscated image [8]. In order to be able to fight discriminatory use of machine learning algorithms that leverage such biases, one needs to first define the notion of algorithmic fairness. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision making [3]. Fairness definitions fall under three broad types: individual fairness (whereby similar predictions are given to similar individuals [4, 5]), group fairness (whereby different groups are treated equally [4, 5]), and subgroup fairness (whereby a group fairness constraint is being selected, and the task is to determine whether the constraint holds over a large collection of subgroups [6, 7]). In this talk, I will discuss a formal definition of these fairness constraints, examine the ways in which machine learning algorithms can amplify representation bias, and discuss how bias in both the example set and label set of popular datasets has been misused in a discriminatory manner. I will touch upon the issues of ethics and accountability, and present open research directions for tackling algorithmic fairness at the representation level.

引用

页码：3586 / 3587

页数：2

共 26 条

[1] FAIRNESS OPINIONS - HOW FAIR ARE THEY AND WHAT CAN BE DONE ABOUT IT
BEBCHUK, LA
KAHAN, M
[J]. DUKE LAW JOURNAL, 1989, (01) : 27 - 53
[2] Can Ensembling Preprocessing Algorithms Lead to Better Machine Learning Fairness?
Badran, Khaled
Cote, Pierre-Olivier
Kolopanis, Amanda
Bouchoucha, Rached
Collante, Antonio
Costa, Diego Elias
Shihab, Emad
Khomh, Foutse
[J]. COMPUTER, 2023, 56 (04) : 71 - 79
[3] Bias in Machine Learning Software: Why? How? What to Do?
Chakraborty, Joymallya
Majumder, Suvodeep
Menzies, Tim
[J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 429 - 440
[4] Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Asada, Yuki
Fu, Victor
Gandhi, Apurva
Gemawat, Advitya
Zhang, Lihao
He, Dong
Gupta, Vivek
Nosakhare, Ehi
Banda, Dalitso
Sen, Rathijit
Interlandi, Matteo
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (12): : 3598 - 3601
[5] Time bandits: How they are created, why they are tolerated, and what can be done about them
Ketchen, David J., Jr.
Craighead, Christopher W.
Buckley, M. Ronald
[J]. BUSINESS HORIZONS, 2008, 51 (02) : 141 - 149
[6] On Rumors: How Falsehoods Spread, Why We Believe Them, What Can Be Done
Ignatieff, Michael
[J]. FOREIGN AFFAIRS, 2010, 89 (06) : 200 - 200
[7] Erroneous data and drug industry bias can impair machine learning algorithms
Kristiansen, Thomas Birk
[J]. BMJ-BRITISH MEDICAL JOURNAL, 2019, 367
[8] How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness
Valdivia, Ana
Sanchez-Monedero, Javier
Casillas, Jorge
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (04) : 1619 - 1643
[9] How cues of what can be done in a virtual world influence learning: An affordance perspective
Goel, Lakshmi
Johnson, Norman A.
Junglas, Iris
Ives, Blake
[J]. INFORMATION & MANAGEMENT, 2013, 50 (05) : 197 - 206
[10] Democratizing artificial intelligence: How no-code AI can leverage machine learning operations
Sundberg, Leif
Holmstrom, Jonny
[J]. BUSINESS HORIZONS, 2023, 66 (06) : 777 - 788

← 1 2 3 →