Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引：0

作者：

Movin, Maria ^{[1
,2
]}

Dinis Junior, Guilherme ^{[1
,2
]}

Hollmen, Jaakko ^{[1
,2
]}

Papapetrou, Panagiotis ^{[2
]}

机构：

[1] Spotify, Stockholm, Sweden

[2] Stockholm Univ, Stockholm, Sweden

来源：

ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023 | 2023年 / 13876卷

关键词：

Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;

D O I：

10.1007/978-3-031-30047-9_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.

引用

页码：314 / 326

页数：13

共 50 条

[31] AutoAttacker: A reinforcement learning approach for black-box adversarial attacks
Tsingenopoulos, Ilias
Preuveneers, Davy
Joosen, Wouter
2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 229 - 237
[32] Deep Reinforcement Learning for Black-box Testing of Android Apps
Romdhana, Andrea
Merlo, Alessio
Ceccato, Mariano
Tonella, Paolo
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (04)
[33] Safe Reinforcement Learning Using Black-Box Reachability Analysis
Selim, Mahmoud
Alanwar, Amr
Kousik, Shreyas
Gao, Grace
Pavone, Marco
Johansson, Karl H.
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10665 - 10672
[34] Reinforcement learning through interaction among multiple agents
Iima, Hitoshi
Kuroe, Yasuaki
2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 100 - +
[35] Learning Policies for Embodied Virtual Agents Through Demonstration
Dinerstein, Jonathan
Egbert, Parris K.
Ventura, Dan
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1257 - 1262
[36] ACAMDA: Improving Data Efficiency in Reinforcement Learning Through Guided Counterfactual Data Augmentation
Sun, Yuewen
Wang, Erli
Huang, Biwei
Lu, Chaochao
Feng, Lu
Sun, Changyin
Zhang, Kun
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15193 - 15201
[37] Explaining Simulations Through Self Explaining Agents
Harbers, Maaike
Meyer, John-Jules
van den Bosch, Karel
JASSS-THE JOURNAL OF ARTIFICIAL SOCIETIES AND SOCIAL SIMULATION, 2010, 13 (01):
[38] Explaining the Performance of Black Box Regression Models
Areosa, Ines
Torgo, Luis
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 110 - 118
[39] Explaining the black box: HPWS and organisational climate
Cafferkey, Kenneth
Dundon, Tony
PERSONNEL REVIEW, 2015, 44 (05) : 666 - 688
[40] A Survey of Methods for Explaining Black Box Models
Guidotti, Riccardo
Monreale, Anna
Ruggieri, Salvatore
Turin, Franco
Giannotti, Fosca
Pedreschi, Dino
ACM COMPUTING SURVEYS, 2019, 51 (05)

← 1 2 3 4 5 →