Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引：0

作者：

Movin, Maria ^{[1
,2
]}

Dinis Junior, Guilherme ^{[1
,2
]}

Hollmen, Jaakko ^{[1
,2
]}

Papapetrou, Panagiotis ^{[2
]}

机构：

[1] Spotify, Stockholm, Sweden

[2] Stockholm Univ, Stockholm, Sweden

来源：

ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023 | 2023年 / 13876卷

关键词：

Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;

D O I：

10.1007/978-3-031-30047-9_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.

引用

页码：314 / 326

页数：13

共 50 条

[1] Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes
Amitai, Yotam
Septon, Yael
Amir, Ofra
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10003 - 10011
[2] Explaining Black Box Drug Target Prediction Through Model Agnostic Counterfactual Samples
Nguyen, Tri Minh
Quinn, Thomas P.
Nguyen, Thin
Tran, Truyen
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1020 - 1029
[3] Explaining the black-box smoothly-A counterfactual approach
Singla, Sumedha
Eslami, Motahhare
Pollack, Brian
Wallace, Stephen
Batmanghelich, Kayhan
MEDICAL IMAGE ANALYSIS, 2023, 84
[4] A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents
Liu, Tongtong
McCalmon, Joe
Le, Thai
Rahman, Md Asifur
Lee, Dongwon
Alqahtani, Sarra
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2023, 37 (02)
[5] A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents
Tongtong Liu
Joe McCalmon
Thai Le
Md Asifur Rahman
Dongwon Lee
Sarra Alqahtani
Autonomous Agents and Multi-Agent Systems, 2023, 37
[6] EDGE: Explaining Deep Reinforcement Learning Policies
Guo, Wenbo
Wu, Xian
Khan, Usmann
Xing, Xinyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[7] Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations
Mothilal, Ramaravind K.
Sharma, Amit
Tan, Chenhao
FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 607 - 617
[8] Explaining Black Box Models Through Twin Systems
Cau, Federico Maria
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES COMPANION (IUI'20), 2020, : 27 - 28
[9] Counterfactual state explanations for reinforcement learning agents via generative deep learning
Olson, Matthew L.
Khanna, Roli
Neal, Lawrence
Li, Fuxin
Wong, Weng-Keen
ARTIFICIAL INTELLIGENCE, 2021, 295
[10] Adversarial Black-Box Attacks on Vision-based Deep Reinforcement Learning Agents
Tanev, Atanas
Pavlitskaya, Svetlana
Sigloch, Joan
Roennau, Arne
Dillmann, Ruediger
Zoellner, J. Marius
2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SAFETY FOR ROBOTICS (ISR), 2021, : 177 - 181

← 1 2 3 4 5 →