Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引:0
|
作者
Movin, Maria [1 ,2 ]
Dinis Junior, Guilherme [1 ,2 ]
Hollmen, Jaakko [1 ,2 ]
Papapetrou, Panagiotis [2 ]
机构
[1] Spotify, Stockholm, Sweden
[2] Stockholm Univ, Stockholm, Sweden
关键词
Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;
D O I
10.1007/978-3-031-30047-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.
引用
收藏
页码:314 / 326
页数:13
相关论文
共 50 条
  • [31] AutoAttacker: A reinforcement learning approach for black-box adversarial attacks
    Tsingenopoulos, Ilias
    Preuveneers, Davy
    Joosen, Wouter
    2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 229 - 237
  • [32] Deep Reinforcement Learning for Black-box Testing of Android Apps
    Romdhana, Andrea
    Merlo, Alessio
    Ceccato, Mariano
    Tonella, Paolo
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (04)
  • [33] Safe Reinforcement Learning Using Black-Box Reachability Analysis
    Selim, Mahmoud
    Alanwar, Amr
    Kousik, Shreyas
    Gao, Grace
    Pavone, Marco
    Johansson, Karl H.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 10665 - 10672
  • [34] Reinforcement learning through interaction among multiple agents
    Iima, Hitoshi
    Kuroe, Yasuaki
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 100 - +
  • [35] Learning Policies for Embodied Virtual Agents Through Demonstration
    Dinerstein, Jonathan
    Egbert, Parris K.
    Ventura, Dan
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1257 - 1262
  • [36] ACAMDA: Improving Data Efficiency in Reinforcement Learning Through Guided Counterfactual Data Augmentation
    Sun, Yuewen
    Wang, Erli
    Huang, Biwei
    Lu, Chaochao
    Feng, Lu
    Sun, Changyin
    Zhang, Kun
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15193 - 15201
  • [37] Explaining Simulations Through Self Explaining Agents
    Harbers, Maaike
    Meyer, John-Jules
    van den Bosch, Karel
    JASSS-THE JOURNAL OF ARTIFICIAL SOCIETIES AND SOCIAL SIMULATION, 2010, 13 (01):
  • [38] Explaining the Performance of Black Box Regression Models
    Areosa, Ines
    Torgo, Luis
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 110 - 118
  • [39] Explaining the black box: HPWS and organisational climate
    Cafferkey, Kenneth
    Dundon, Tony
    PERSONNEL REVIEW, 2015, 44 (05) : 666 - 688
  • [40] A Survey of Methods for Explaining Black Box Models
    Guidotti, Riccardo
    Monreale, Anna
    Ruggieri, Salvatore
    Turin, Franco
    Giannotti, Fosca
    Pedreschi, Dino
    ACM COMPUTING SURVEYS, 2019, 51 (05)