Contrastive Explanations for Model Interpretability

被引:0
|
作者
Jacovi, Alon [1 ]
Swayamdipta, Swabha [2 ]
Ravfogel, Shauli [1 ,2 ]
Elazar, Yanai [1 ,2 ]
Choi, Yejin [2 ,3 ]
Goldberg, Yoav [1 ,2 ]
机构
[1] Bar Ilan Univ, Ramat Gan, Israel
[2] Allen Inst Artificial Intelligence, Seattle, WA USA
[3] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Additionally, for a given input feature, our contrastive explanations can answer for which label, and against which alternative label, is the feature useful. We produce contrastive explanations via both highlevel abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.
引用
收藏
页码:1597 / 1611
页数:15
相关论文
共 50 条
  • [41] Contrastive Explanations to Classification Systems Using Sparse Dictionaries
    Apicella, A.
    Isgro, F.
    Prevete, R.
    Tamburrini, G.
    [J]. IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT I, 2019, 11751 : 207 - 218
  • [42] Self-forming actions, contrastive explanations, and the structure of the will
    Campbell, Neil
    [J]. SYNTHESE, 2020, 197 (03) : 1225 - 1240
  • [43] Self-forming actions, contrastive explanations, and the structure of the will
    Neil Campbell
    [J]. Synthese, 2020, 197 : 1225 - 1240
  • [44] A Survey of Algorithmic Recourse: Contrastive Explanations and Consequential Recommendations
    Karimi, Amir-Hossein
    Barthe, Gilles
    Schoelkopf, Bernhard
    Valera, Isabel
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (05)
  • [45] ALICE: Active Learning with Contrastive Natural Language Explanations
    Liang, Weixin
    Zotil, James
    Yu, Zhou
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4380 - 4391
  • [46] Towards Transparent Robotic Planning via Contrastive Explanations
    Chen, Shenghui
    Boggess, Kayla
    Feng, Lu
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 6593 - 6598
  • [47] Explainability of Point Cloud Neural Networks Using SMILE: Statistical Model-Agnostic Interpretability with Local Explanations
    Ahmadi, Seyed Mohammad
    Aslansefat, Koorosh
    Valcarce-Diñeiro, Rubén
    Barnfather, Joshua
    [J]. arXiv,
  • [48] Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations
    Parisineni, Sai Ram Aditya
    Pal, Mayukha
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 18 (04) : 457 - 466
  • [49] Multiparametric MRI model to predict molecular subtypes of breast cancer using Shapley additive explanations interpretability analysis
    Huang, Yao
    Wang, Xiaoxia
    Cao, Ying
    Li, Mengfei
    Li, Lan
    Chen, Huifang
    Tang, Sun
    Lan, Xiaosong
    Jiang, Fujie
    Zhang, Jiuquan
    [J]. DIAGNOSTIC AND INTERVENTIONAL IMAGING, 2024, 105 (05) : 191 - 205
  • [50] Truthful meta-explanations for local interpretability of machine learning models
    Ioannis Mollas
    Nick Bassiliades
    Grigorios Tsoumakas
    [J]. Applied Intelligence, 2023, 53 : 26927 - 26948