The Dark Side of Machine Learning Algorithms: How and Why They Can Leverage Bias, and What Can Be Done to Pursue Algorithmic Fairness

被引:5
|
作者
Vasileva, Mariya, I [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
关键词
Bias in machine learning algorithms; fairness; accountability; transparency; representation learning;
D O I
10.1145/3394486.3411068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning and access to big data are revolutionizing the way many industries operate, providing analytics and automation to many aspects of real-world practical tasks that were previously thought to be necessarily manual. With the pervasiveness of artificial intelligence and machine learning over the past decade, and their epidemic spread in a variety of applications, algorithmic fairness has become a prominent open research problem. For instance, machine learning is used in courts to assess the probability that a defendant recommits a crime; in the medical domain to assist with diagnosis or predict predisposition to certain diseases; in social welfare systems; and autonomous vehicles. The decision making processes in these real-world applications have a direct effect on people's lives, and can cause harm to society if the machine learning algorithms deployed are not designed with considerations to fairness. The ability to collect and analyze large datasets for problems in many domains brings forward the danger of implicit data bias, which could be harmful. Data, especially big data, is often heterogeneous, generated by different subgroups with their own characteristics and behaviors. Furthermore, data collection strategies vary vastly across domains, and labelling of examples is performed by human annotators, thus causing the labelling process to amplify inherent biases the annotators might harbor. A model learned on biased data may not only lead to unfair and inaccurate predictions, but also significantly disadvantage certain subgroups, and lead to unfairness in downstream learning tasks. There are multiple ways in which discriminatory bias can seep into data: for example, in medical domains, there are many instances in which the data used are skewed toward certain populations-which can have dangerous consequences for the underrepresented communities W. Another example are large-scale datasets widely used in machine learning tasks, like ImageNet and Open Images: [2] shows that these datasets suffer from representation bias, and advocates for the need to incorporate geo-diversity and inclusion. Yet another example are the popular face recognition and generation datasets like CelebA and Flickr-Faces-HQ, where the ethnic and racial breakdown of example faces shows significant representation bias, evident in downstream tasks like face reconstruction from an obfuscated image [8]. In order to be able to fight discriminatory use of machine learning algorithms that leverage such biases, one needs to first define the notion of algorithmic fairness. Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their intrinsic or acquired traits in the context of decision making [3]. Fairness definitions fall under three broad types: individual fairness (whereby similar predictions are given to similar individuals [4, 5]), group fairness (whereby different groups are treated equally [4, 5]), and subgroup fairness (whereby a group fairness constraint is being selected, and the task is to determine whether the constraint holds over a large collection of subgroups [6, 7]). In this talk, I will discuss a formal definition of these fairness constraints, examine the ways in which machine learning algorithms can amplify representation bias, and discuss how bias in both the example set and label set of popular datasets has been misused in a discriminatory manner. I will touch upon the issues of ethics and accountability, and present open research directions for tackling algorithmic fairness at the representation level.
引用
收藏
页码:3586 / 3587
页数:2
相关论文
共 26 条
  • [1] FAIRNESS OPINIONS - HOW FAIR ARE THEY AND WHAT CAN BE DONE ABOUT IT
    BEBCHUK, LA
    KAHAN, M
    [J]. DUKE LAW JOURNAL, 1989, (01) : 27 - 53
  • [2] Can Ensembling Preprocessing Algorithms Lead to Better Machine Learning Fairness?
    Badran, Khaled
    Cote, Pierre-Olivier
    Kolopanis, Amanda
    Bouchoucha, Rached
    Collante, Antonio
    Costa, Diego Elias
    Shihab, Emad
    Khomh, Foutse
    [J]. COMPUTER, 2023, 56 (04) : 71 - 79
  • [3] Bias in Machine Learning Software: Why? How? What to Do?
    Chakraborty, Joymallya
    Majumder, Suvodeep
    Menzies, Tim
    [J]. PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 429 - 440
  • [4] Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
    Asada, Yuki
    Fu, Victor
    Gandhi, Apurva
    Gemawat, Advitya
    Zhang, Lihao
    He, Dong
    Gupta, Vivek
    Nosakhare, Ehi
    Banda, Dalitso
    Sen, Rathijit
    Interlandi, Matteo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (12): : 3598 - 3601
  • [5] Time bandits: How they are created, why they are tolerated, and what can be done about them
    Ketchen, David J., Jr.
    Craighead, Christopher W.
    Buckley, M. Ronald
    [J]. BUSINESS HORIZONS, 2008, 51 (02) : 141 - 149
  • [6] On Rumors: How Falsehoods Spread, Why We Believe Them, What Can Be Done
    Ignatieff, Michael
    [J]. FOREIGN AFFAIRS, 2010, 89 (06) : 200 - 200
  • [7] Erroneous data and drug industry bias can impair machine learning algorithms
    Kristiansen, Thomas Birk
    [J]. BMJ-BRITISH MEDICAL JOURNAL, 2019, 367
  • [8] How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness
    Valdivia, Ana
    Sanchez-Monedero, Javier
    Casillas, Jorge
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (04) : 1619 - 1643
  • [9] How cues of what can be done in a virtual world influence learning: An affordance perspective
    Goel, Lakshmi
    Johnson, Norman A.
    Junglas, Iris
    Ives, Blake
    [J]. INFORMATION & MANAGEMENT, 2013, 50 (05) : 197 - 206
  • [10] Democratizing artificial intelligence: How no-code AI can leverage machine learning operations
    Sundberg, Leif
    Holmstrom, Jonny
    [J]. BUSINESS HORIZONS, 2023, 66 (06) : 777 - 788