DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

被引:0
|
作者
Moreira, Ricardo [1 ]
Bono, Jacopo [1 ]
Cardoso, Mario [1 ]
Saleiro, Pedro [1 ]
Figueiredo, Mario [2 ]
Bizarro, Pedro [1 ]
机构
[1] Feedzai, Coimbra, Portugal
[2] ULisboa, Inst Super Tecn, ELLIS Unit Lisbon, Inst Telecomunicacoes, Lisbon, Portugal
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the predictive task performance. Despite the recent rapid advances in AI explainability, as far as we know, no method yet fulfills these three desiderata. Indeed, mainstream methods for local concept explainability do not yield causal explanations and incur a trade-off between explainability and prediction accuracy. We present DiConStruct, an explanation method that is both concept-based and causal, which produces more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Consequently, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts. [GRAPHICS] .
引用
收藏
页码:740 / 768
页数:29
相关论文
共 50 条
  • [31] FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning
    Han, Pengchao
    Shi, Xingyan
    Huang, Jianwei
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2024, 42 (11) : 3064 - 3077
  • [32] Improving Diversity in Black-Box Few-Shot Knowledge Distillation
    Vo, Tri-Nhan
    Nguyen, Dang
    Do, Kien
    Gupta, Sunil
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT II, ECML PKDD 2024, 2024, 14942 : 178 - 196
  • [33] Debias the Black-Box: A Fair Ranking Framework via Knowledge Distillation
    Zhu, Zhitao
    Si, Shijing
    Wang, Jianzong
    Yang, Yaodong
    Xiao, Jing
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2022, 2022, 13724 : 395 - 405
  • [34] AKD: Using Adversarial Knowledge Distillation to Achieve Black-box Attacks
    Lian, Xin
    Huang, Zhiqiu
    Wang, Chao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] Zero-Shot Knowledge Distillation from a Decision-Based Black-Box Model
    Wang, Zi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7688 - 7699
  • [36] Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure
    Novello, Paul
    Fel, Thomas
    Vigouroux, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [37] Extracting Explanations, Justification, and Uncertainty from Black-Box Deep Neural Networks
    Ardis, Paul
    Flenner, Arjuna
    ASSURANCE AND SECURITY FOR AI-ENABLED SYSTEMS, 2024, 13054
  • [38] BLACK-BOX ATTACKS ON IMAGE ACTIVITY PREDICTION AND ITS NATURAL LANGUAGE EXPLANATIONS
    Baia, Alina Elena
    Poggioni, Valentina
    Cavallaro, Andrea
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3688 - 3697
  • [39] Rule-based approximation of black-box classifiers for tabular data to generate global and local explanations
    Maszczyk, Cezary
    Kozielski, Michal
    Sikora, Marek
    PROCEEDINGS OF THE 2022 17TH CONFERENCE ON COMPUTER SCIENCE AND INTELLIGENCE SYSTEMS (FEDCSIS), 2022, : 89 - 92
  • [40] Physical Black-Box Adversarial Attacks Through Transformations
    Jiang, Wenbo
    Li, Hongwei
    Xu, Guowen
    Zhang, Tianwei
    Lu, Rongxing
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (03) : 964 - 974