Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引:0
|
作者
Movin, Maria [1 ,2 ]
Dinis Junior, Guilherme [1 ,2 ]
Hollmen, Jaakko [1 ,2 ]
Papapetrou, Panagiotis [2 ]
机构
[1] Spotify, Stockholm, Sweden
[2] Stockholm Univ, Stockholm, Sweden
关键词
Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;
D O I
10.1007/978-3-031-30047-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.
引用
收藏
页码:314 / 326
页数:13
相关论文
共 50 条
  • [1] Explaining Reinforcement Learning Agents through Counterfactual Action Outcomes
    Amitai, Yotam
    Septon, Yael
    Amir, Ofra
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10003 - 10011
  • [2] Explaining Black Box Drug Target Prediction Through Model Agnostic Counterfactual Samples
    Nguyen, Tri Minh
    Quinn, Thomas P.
    Nguyen, Thin
    Tran, Truyen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1020 - 1029
  • [3] Explaining the black-box smoothly-A counterfactual approach
    Singla, Sumedha
    Eslami, Motahhare
    Pollack, Brian
    Wallace, Stephen
    Batmanghelich, Kayhan
    MEDICAL IMAGE ANALYSIS, 2023, 84
  • [4] A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents
    Liu, Tongtong
    McCalmon, Joe
    Le, Thai
    Rahman, Md Asifur
    Lee, Dongwon
    Alqahtani, Sarra
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2023, 37 (02)
  • [5] A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents
    Tongtong Liu
    Joe McCalmon
    Thai Le
    Md Asifur Rahman
    Dongwon Lee
    Sarra Alqahtani
    Autonomous Agents and Multi-Agent Systems, 2023, 37
  • [6] EDGE: Explaining Deep Reinforcement Learning Policies
    Guo, Wenbo
    Wu, Xian
    Khan, Usmann
    Xing, Xinyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations
    Mothilal, Ramaravind K.
    Sharma, Amit
    Tan, Chenhao
    FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, : 607 - 617
  • [8] Explaining Black Box Models Through Twin Systems
    Cau, Federico Maria
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES COMPANION (IUI'20), 2020, : 27 - 28
  • [9] Counterfactual state explanations for reinforcement learning agents via generative deep learning
    Olson, Matthew L.
    Khanna, Roli
    Neal, Lawrence
    Li, Fuxin
    Wong, Weng-Keen
    ARTIFICIAL INTELLIGENCE, 2021, 295
  • [10] Adversarial Black-Box Attacks on Vision-based Deep Reinforcement Learning Agents
    Tanev, Atanas
    Pavlitskaya, Svetlana
    Sigloch, Joan
    Roennau, Arne
    Dillmann, Ruediger
    Zoellner, J. Marius
    2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SAFETY FOR ROBOTICS (ISR), 2021, : 177 - 181