Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引：0

作者：

Movin, Maria ^{[1
,2
]}

Dinis Junior, Guilherme ^{[1
,2
]}

Hollmen, Jaakko ^{[1
,2
]}

Papapetrou, Panagiotis ^{[2
]}

机构：

[1] Spotify, Stockholm, Sweden

[2] Stockholm Univ, Stockholm, Sweden

来源：

ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023 | 2023年 / 13876卷

关键词：

Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;

D O I：

10.1007/978-3-031-30047-9_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.

引用

页码：314 / 326

页数：13

共 50 条

[41] Glass-Box: Explaining AI Decisions With Counterfactual Statements Through Conversation With a Voice-enabled Virtual Assistant
Sokol, Kacper
Flach, Peter
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5868 - 5870
[42] Learning adversarial attack policies through multi-objective reinforcement learning
Garcia, Javier
Majadas, Ruben
Fernandez, Fernando
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
[43] Reinforcement: A peek inside the black box
Franklin, K
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 2155 - 2155
[44] DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation
Fang, Yingying
Wu, Shuang
Jin, Zihao
Wang, Shiyi
Xu, Caiwen
Walsh, Simon
Yang, Guang
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 208 - 218
[45] Reinforcement Learning-Based Black-Box Model Inversion Attacks
Han, Gyojin
Choi, Jaehyun
Lee, Haeil
Kim, Junmo
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20504 - 20513
[46] Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning
Huttenrauch, Maximilian
Neumann, Gerhard
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 44
[47] A New Black Box Attack Generating Adversarial Examples Based on Reinforcement Learning
Xiao, Wenli
Jiang, Hao
Xia, Song
2020 INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE (ICTC), 2020, : 141 - 146
[48] Natural Black-Box Adversarial Examples against Deep Reinforcement Learning
Yu, Mengran
Sun, Shiliang
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8936 - 8944
[49] MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning
Ma, Zeyuan
Guo, Hongshu
Chen, Jiacheng
Li, Zhenrui
Peng, Guojun
Gong, Yue-Jiao
Ma, Yining
Cao, Zhiguang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] Explaining intersectionality through description, counterfactual thinking, and mediation analysis
John W. Jackson
Social Psychiatry and Psychiatric Epidemiology, 2017, 52 : 785 - 793

← 1 2 3 4 5 →