Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

被引:4
|
作者
Chen Z. [1 ]
Silvestri F. [2 ]
Tolomei G. [2 ]
Wang J. [3 ]
Zhu H. [4 ]
Ahn H. [1 ]
机构
[1] Stony Brook University, Department of Applied Mathematics and Statistics, Stony Brook, 11794, NY
[2] Sapienza University of Rome, Department of Computer Engineering, The Department of Computer Science, Rome
[3] Xi'An Jiaotong-Liverpool University, Department of Intelligent Science, Suzhou
[4] Rutgers University-New Brunswick, Department of Computer Science, Piscataway, 08854, NJ
来源
关键词
Counterfactual explanations; deep reinforcement learning (DRL); explainable artificial intelligence (XAI); machine learning (ML) explainability;
D O I
10.1109/TAI.2022.3223892
中图分类号
学科分类号
摘要
Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases. © 2020 IEEE.
引用
下载
收藏
页码:1443 / 1457
页数:14
相关论文
共 50 条
  • [31] Evaluating Local Interpretable Model-Agnostic Explanations on Clinical Machine Learning Classification Models
    Kumarakulasinghe, Nesaretnam Barr
    Blomberg, Tobias
    Lin, Jintai
    Leao, Alexandra Saraiva
    Papapetrou, Panagiotis
    2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 7 - 12
  • [32] Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability
    Zafar, Muhammad Rehman
    Khan, Naimul
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2021, 3 (03): : 525 - 541
  • [33] MODEL-AGNOSTIC VISUAL EXPLANATIONS VIA APPROXIMATE BILINEAR MODELS
    Joukovsky, Boris
    Sammani, Fawaz
    Deligiannis, Nikos
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1770 - 1774
  • [34] Unsupervised Anomaly Detection for Financial Auditing with Model-Agnostic Explanations
    Kiefer, Sebastian
    Pesch, Gunter
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021, 2021, 12873 : 291 - 308
  • [35] Model-Agnostic Knowledge Graph Embedding Explanations for Recommender Systems
    Zanon, Andre Levi
    Dutra da Rocha, Leonardo Chaves
    Manzato, Marcelo Garcia
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, PT II, XAI 2024, 2024, 2154 : 3 - 27
  • [36] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
    Finn, Chelsea
    Abbeel, Pieter
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [37] Model-Agnostic Counterfactual Reasoning for Eliminating Popularity Bias in Recommender System
    Wei, Tianxin
    Feng, Fuli
    Chen, Jiawei
    Wu, Ziwei
    Yi, Jinfeng
    He, Xiangnan
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1791 - 1800
  • [38] Causality-Aware Local Interpretable Model-Agnostic Explanations
    Cinquin, Martina
    Guidotti, Riccardo
    EXPLAINABLE ARTIFICIAL INTELLIGENCE, PT III, XAI 2024, 2024, 2155 : 108 - 124
  • [39] ILIME: Local and Global Interpretable Model-Agnostic Explainer of Black-Box Decision
    ElShawi, Radwa
    Sherif, Youssef
    Al-Mallah, Mouaz
    Sakr, Sherif
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 53 - 68
  • [40] Local Interpretable Model-Agnostic Explanations for Classification of Lymph Node Metastases
    de Sousa, Iam Palatnik
    Bernardes Rebuzzi Vellasco, Marley Maria
    da Silva, Eduardo Costa
    SENSORS, 2019, 19 (13)