Explaining by Removing: A Unified Framework for Model Explanation

被引：0

作者：

Covert, Ian C. ^{[1
]}

Lundberg, Scott ^{[2
]}

Lee, Su-In ^{[1
]}

机构：

[1] Univ Washington, Paul G Allen Sch Comp Sci Engn, Seattle, WA 98195 USA

[2] Microsoft Corp, Microsoft Res, Redmond, WA 98052 USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2021年 / 22卷

关键词：

Model explanation; interpretability; information theory; cooperative game theory; psychology; BLACK-BOX; CLASSIFICATIONS; EXPRESSION; REGRESSION; DECISIONS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

引用

页数：90

共 50 条

[21] EXPLAINING EXPLANATION - RUBEN,DH
WOODWARD, J
PHILOSOPHY AND PHENOMENOLOGICAL RESEARCH, 1996, 56 (02) : 477 - 482
[22] EXPLAINING EXPLANATION - RUBEN,DH
JARVIE, IC
DIALOGUE-CANADIAN PHILOSOPHICAL REVIEW, 1993, 32 (04) : 831 - 833
[23] EXPLAINING EXPLANATION - RUBEN,DH
PETRIE, B
MIND, 1991, 100 (399) : 410 - 412
[24] Explaining understanding (or understanding explanation)
Wesley Van Camp
European Journal for Philosophy of Science, 2014, 4 : 95 - 114
[25] Toward a Unified Model Explaining Heterogeneous Ziegler-Natta Catalysis
Credendino, Raffaele
Liguori, Dario
Fan, Zhiqiang
Morini, Giampiero
Cavallo, Luigi
ACS CATALYSIS, 2015, 5 (09): : 5431 - 5435
[26] A UNIFIED FRAMEWORK FOR EXPLANATION-BASED GENERALIZATION OF PARTIALLY ORDERED AND PARTIALLY INSTANTIATED PLANS
KAMBHAMPATI, S
KEDAR, S
ARTIFICIAL INTELLIGENCE, 1994, 67 (01) : 29 - 70
[27] Explaining the sex difference in depression with a unified bargaining model of anger and depression
Hagen, Edward H.
Rosenstrom, Tom
EVOLUTION MEDICINE AND PUBLIC HEALTH, 2016, (01): : 117 - 132
[28] A Unified Lattice Model and Framework for Purity Analyses
Helm, Dominik
Kuebler, Florian
Eichberg, Michael
Reif, Michael
Mezini, Mira
PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 340 - 350
[29] A unified framework for Schelling's model of segregation
Rogers, Tim
McKane, Alan J.
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2011,
[30] A Unified Framework of the Cloud Computing Service Model
Wen-Lung Shiau
Chao-Ming Hsiao
Journal of Electronic Science and Technology, 2013, (02) : 150 - 160

← 1 2 3 4 5 →