Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

被引：6

作者：

Chen Z. ^{[1
]}

Silvestri F. ^{[2
]}

Tolomei G. ^{[2
]}

Wang J. ^{[3
]}

Zhu H. ^{[4
]}

Ahn H. ^{[1
]}

机构：

[1] Stony Brook University, Department of Applied Mathematics and Statistics, Stony Brook, 11794, NY

[2] Sapienza University of Rome, Department of Computer Engineering, The Department of Computer Science, Rome

[3] Xi'An Jiaotong-Liverpool University, Department of Intelligent Science, Suzhou

[4] Rutgers University-New Brunswick, Department of Computer Science, Piscataway, 08854, NJ

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 04期

关键词：

Counterfactual explanations; deep reinforcement learning (DRL); explainable artificial intelligence (XAI); machine learning (ML) explainability;

D O I：

10.1109/TAI.2022.3223892

中图分类号：

学科分类号：

摘要：

Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases. © 2020 IEEE.

引用

页码：1443 / 1457

页数：14

共 50 条

[31] On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning
Fallah, Alireza
Georgiev, Kristian
Mokhtari, Aryan
Ozdaglar, Asuman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[32] Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?
Chen, Lisha
Chen, Tianyi
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[33] Model-agnostic local explanation: Multi-objective genetic algorithm explainer
Nematzadeh, Hossein
Garcia-Nieto, Jose
Hurtado, Sandro
Aldana-Montes, Jose F.
Navas-Delgado, Ismael
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
[34] Interpretable ensemble deep learning model for early detection of Alzheimer's disease using local interpretable model-agnostic explanations
Aghaei, Atefe
Moghaddam, Mohsen Ebrahimi
Malek, Hamed
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (06) : 1889 - 1902
[35] Evaluating Local Interpretable Model-Agnostic Explanations on Clinical Machine Learning Classification Models
Kumarakulasinghe, Nesaretnam Barr
Blomberg, Tobias
Lin, Jintai
Leao, Alexandra Saraiva
Papapetrou, Panagiotis
2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 7 - 12
[36] Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability
Zafar, Muhammad Rehman
Khan, Naimul
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2021, 3 (03): : 525 - 541
[37] MODEL-AGNOSTIC VISUAL EXPLANATIONS VIA APPROXIMATE BILINEAR MODELS
Joukovsky, Boris
Sammani, Fawaz
Deligiannis, Nikos
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1770 - 1774
[38] Unsupervised Anomaly Detection for Financial Auditing with Model-Agnostic Explanations
Kiefer, Sebastian
Pesch, Gunter
ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021, 2021, 12873 : 291 - 308
[39] Model-Agnostic Knowledge Graph Embedding Explanations for Recommender Systems
Zanon, Andre Levi
Dutra da Rocha, Leonardo Chaves
Manzato, Marcelo Garcia
EXPLAINABLE ARTIFICIAL INTELLIGENCE, PT II, XAI 2024, 2024, 2154 : 3 - 27
[40] Model-Agnostic Policy Explanations: Biased Sampling for Surrogate Models
Lavender, Bryan
Sen, Sandip
EXPLAINABLE AND TRANSPARENT AI AND MULTI-AGENT SYSTEMS, EXTRAAMAS 2024, 2024, 14847 : 137 - 151

← 1 2 3 4 5 →