DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

被引:0
|
作者
Moreira, Ricardo [1 ]
Bono, Jacopo [1 ]
Cardoso, Mario [1 ]
Saleiro, Pedro [1 ]
Figueiredo, Mario [2 ]
Bizarro, Pedro [1 ]
机构
[1] Feedzai, Coimbra, Portugal
[2] ULisboa, Inst Super Tecn, ELLIS Unit Lisbon, Inst Telecomunicacoes, Lisbon, Portugal
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the predictive task performance. Despite the recent rapid advances in AI explainability, as far as we know, no method yet fulfills these three desiderata. Indeed, mainstream methods for local concept explainability do not yield causal explanations and incur a trade-off between explainability and prediction accuracy. We present DiConStruct, an explanation method that is both concept-based and causal, which produces more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Consequently, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts. [GRAPHICS] .
引用
收藏
页码:740 / 768
页数:29
相关论文
共 50 条
  • [1] Generative causal explanations of black-box classifiers
    O'Shaughnessy, Matthew
    Canal, Gregory
    Connor, Marissa
    Davenport, Mark
    Rozell, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] A Generic Framework for Black-box Explanations
    Henin, Clement
    Le Metayer, Daniel
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3667 - 3676
  • [3] Causal Interpretations of Black-Box Models
    Zhao, Qingyuan
    Hastie, Trevor
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2021, 39 (01) : 272 - 281
  • [4] Learning Groupwise Explanations for Black-Box Models
    Gao, Jingyue
    Wang, Xiting
    Wang, Yasha
    Yan, Yulan
    Xie, Xing
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2396 - 2402
  • [5] Towards Automatic Concept-based Explanations
    Ghorbani, Amirata
    Wexler, James
    Zou, James
    Kim, Been
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Stable and actionable explanations of black-box models through factual and counterfactual rules
    Guidotti, Riccardo
    Monreale, Anna
    Ruggieri, Salvatore
    Naretto, Francesca
    Turini, Franco
    Pedreschi, Dino
    Giannotti, Fosca
    DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 38 (05) : 2825 - 2862
  • [7] Feature Importance Explanations for Temporal Black-Box Models
    Sood, Akshay
    Craven, Mark
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8351 - 8360
  • [8] Black-Box Few-Shot Knowledge Distillation
    Dang Nguyen
    Gupta, Sunil
    Do, Kien
    Venkatesh, Svetha
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 196 - 211
  • [9] Concept Activation Regions: A Generalized Framework For Concept-Based Explanations
    Crabbe, Jonathan
    van der Schaar, Mihaela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Deep Causal Graphs for Causal Inference, Black-Box Explainability and Fairness
    Parafita, Alvaro y
    Vitria, Jordi
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2021, 339 : 415 - 424