Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

被引：6

作者：

Chen Z. ^{[1
]}

Silvestri F. ^{[2
]}

Tolomei G. ^{[2
]}

Wang J. ^{[3
]}

Zhu H. ^{[4
]}

Ahn H. ^{[1
]}

机构：

[1] Stony Brook University, Department of Applied Mathematics and Statistics, Stony Brook, 11794, NY

[2] Sapienza University of Rome, Department of Computer Engineering, The Department of Computer Science, Rome

[3] Xi'An Jiaotong-Liverpool University, Department of Intelligent Science, Suzhou

[4] Rutgers University-New Brunswick, Department of Computer Science, Piscataway, 08854, NJ

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 04期

关键词：

Counterfactual explanations; deep reinforcement learning (DRL); explainable artificial intelligence (XAI); machine learning (ML) explainability;

D O I：

10.1109/TAI.2022.3223892

中图分类号：

学科分类号：

摘要：

Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases. © 2020 IEEE.

引用

页码：1443 / 1457

页数：14

共 50 条

[1] Learning Model-Agnostic Counterfactual Explanations for Tabular Data
Pawelczyk, Martin
Broelemann, Klaus
Kasneci, Gjergji
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 3126 - 3132
[2] Model-Agnostic Counterfactual Explanations in Credit Scoring
Dastile, Xolani
Celik, Turgay
Vandierendonck, Hans
IEEE ACCESS, 2022, 10 : 69543 - 69554
[3] Model-Agnostic Counterfactual Explanations for Consequential Decisions
Karimi, Amir-Hossein
Barthe, Gilles
Balle, Borja
Valera, Isabel
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 895 - 904
[4] RelEx: A Model-Agnostic Relational Model Explainer
Zhang, Yue
Defazio, David
Ramesh, Arti
AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, : 1042 - 1049
[5] MANE: Model-Agnostic Non-linear Explanations for Deep Learning Model
Tian, Yue
Liu, Guanjun
2020 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2020, : 33 - 36
[6] Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation
Wang, Siyu
Chen, Xiaocong
McAuley, Julian
Cripps, Sally
Yao, Lina
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1044 - 1055
[7] MaNtLE: A Model-agnostic Natural Language Explainer
Menon, Rakesh R.
Zaman, Kerem
Srivastava, Shashank
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13493 - 13511
[8] Deep Learning Explainability with Local Interpretable Model-Agnostic Explanations for Monkeypox Prediction
Angmo, Motup
Sharma, Nonita
Mohanty, Sachi Nandan
Ijaz Khan, M.
Mamatov, Abdugafur
Kallel, Mohamed
JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2025,
[9] Individualized help for at-risk students using model-agnostic and counterfactual explanations
Smith, Bevan, I
Chimedza, Charles
Buhrmann, Jacoba H.
EDUCATION AND INFORMATION TECHNOLOGIES, 2022, 27 (02) : 1539 - 1558
[10] CountARFactuals - Generating Plausible Model-Agnostic Counterfactual Explanations with Adversarial Random Forests
Dandl, Susanne
Blesch, Kristin
Freiesleben, Timo
Koenig, Gunnar
Kapar, Jan
Bischl, Bernd
Wright, Marvin N.
EXPLAINABLE ARTIFICIAL INTELLIGENCE, PT III, XAI 2024, 2024, 2155 : 85 - 107

← 1 2 3 4 5 →