PreCoF: counterfactual explanations for fairness

被引:0
|
作者
Sofie Goethals
David Martens
Toon Calders
机构
[1] University of Antwerp,Department of Engineering Management
[2] University of Antwerp,Department of Computer Science
来源
Machine Learning | 2024年 / 113卷
关键词
Explainable Artificial Intelligence; Counterfactual explanations; Fairness; Data science ethics;
D O I
暂无
中图分类号
学科分类号
摘要
This paper studies how counterfactual explanations can be used to assess the fairness of a model. Using machine learning for high-stakes decisions is a threat to fairness as these models can amplify bias present in the dataset, and there is no consensus on a universal metric to detect this. The appropriate metric and method to tackle the bias in a dataset will be case-dependent, and it requires insight into the nature of the bias first. We aim to provide this insight by integrating explainable AI (XAI) research with the fairness domain. More specifically, apart from being able to use (Predictive) Counterfactual Explanations to detect explicit bias when the model is directly using the sensitive attribute, we show that it can also be used to detect implicit bias when the model does not use the sensitive attribute directly but does use other correlated attributes leading to a substantial disadvantage for a protected group. We call this metric PreCoF, or Predictive Counterfactual Fairness. Our experimental results show that our metric succeeds in detecting occurrences of implicit bias in the model by assessing which attributes are more present in the explanations of the protected group compared to the unprotected group. These results could help policymakers decide on whether this discrimination is justified or not.
引用
收藏
页码:3111 / 3142
页数:31
相关论文
共 50 条
  • [1] PreCoF: counterfactual explanations for fairness
    Goethals, Sofie
    Martens, David
    Calders, Toon
    [J]. MACHINE LEARNING, 2024, 113 (05) : 3111 - 3142
  • [2] Counterfactual Fairness
    Kusner, Matt
    Loftus, Joshua
    Russell, Chris
    Silva, Ricardo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [3] Counterfactual Visual Explanations
    Goyal, Yash
    Wu, Ziyan
    Ernst, Jan
    Batra, Dhruv
    Parikh, Devi
    Lee, Stefan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [4] Counterfactual Explanations for Models of Code
    Cito, Juergen
    Dillig, Isil
    Murali, Vijayaraghavan
    Chandra, Satish
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP 2022), 2022, : 125 - 134
  • [5] Counterfactual Causality and Historical Explanations
    Gerber, Doris
    [J]. EXPLANATION IN ACTION THEORY AND HISTORIOGRAPHY: CAUSAL AND TELEOLOGICAL APPROACHES, 2019, : 167 - 178
  • [6] On generating trustworthy counterfactual explanations
    Del Ser, Javier
    Barredo-Arrieta, Alejandro
    Diaz-Rodriguez, Natalia
    Herrera, Francisco
    Saranti, Anna
    Holzinger, Andreas
    [J]. INFORMATION SCIENCES, 2024, 655
  • [7] Diffusion Models for Counterfactual Explanations
    Jeanneret, Guillaume
    Simon, Loic
    Jurie, Fredric
    [J]. COMPUTER VISION - ACCV 2022, PT VII, 2023, 13847 : 219 - 237
  • [8] Adversarial Counterfactual Visual Explanations
    Jeanneret, Guillaume
    Simon, Loic
    Jurie, Frederic
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16425 - 16435
  • [9] Counterfactual Explanations for Neural Recommenders
    Tran, Khanh Hiep
    Ghazimatin, Azin
    Roy, Rishiraj Saha
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1627 - 1631
  • [10] Counterfactual Explanations Can Be Manipulated
    Slack, Dylan
    Hilgard, Sophie
    Lakkaraju, Himabindu
    Singh, Sameer
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34