Explaining Black Box Reinforcement Learning Agents Through Counterfactual Policies

被引:0
|
作者
Movin, Maria [1 ,2 ]
Dinis Junior, Guilherme [1 ,2 ]
Hollmen, Jaakko [1 ,2 ]
Papapetrou, Panagiotis [2 ]
机构
[1] Spotify, Stockholm, Sweden
[2] Stockholm Univ, Stockholm, Sweden
关键词
Explainable AI (XAI); Reinforcement Learning; Counterfactual Explanations;
D O I
10.1007/978-3-031-30047-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the increased attention to explainable AI, explainability methods for understanding reinforcement learning (RL) agents have not been extensively studied. Failing to understand the agent's behavior may cause reduced productivity in human-agent collaborations, or mistrust in automated RL systems. RL agents are trained to optimize a long term cumulative reward, and in this work we formulate a novel problem on how to generate explanations on when an agent could have taken another action to optimize an alternative reward. More concretely, we aim at answering the question: What does an RL agent need to do differently to achieve an alternative target outcome? We introduce the concept of a counterfactual policy, as a policy trained to explain in which states a black box agent could have taken an alternative action to achieve another desired outcome. The usefulness of counterfactual policies is demonstrated in two experiments with different use-cases, and the results suggest that our solution can provide interpretable explanations.
引用
收藏
页码:314 / 326
页数:13
相关论文
共 50 条
  • [41] Glass-Box: Explaining AI Decisions With Counterfactual Statements Through Conversation With a Voice-enabled Virtual Assistant
    Sokol, Kacper
    Flach, Peter
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 5868 - 5870
  • [42] Learning adversarial attack policies through multi-objective reinforcement learning
    Garcia, Javier
    Majadas, Ruben
    Fernandez, Fernando
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [43] Reinforcement: A peek inside the black box
    Franklin, K
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 2155 - 2155
  • [44] DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation
    Fang, Yingying
    Wu, Shuang
    Jin, Zihao
    Wang, Shiyi
    Xu, Caiwen
    Walsh, Simon
    Yang, Guang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 208 - 218
  • [45] Reinforcement Learning-Based Black-Box Model Inversion Attacks
    Han, Gyojin
    Choi, Jaehyun
    Lee, Haeil
    Kim, Junmo
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 20504 - 20513
  • [46] Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning
    Huttenrauch, Maximilian
    Neumann, Gerhard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 44
  • [47] A New Black Box Attack Generating Adversarial Examples Based on Reinforcement Learning
    Xiao, Wenli
    Jiang, Hao
    Xia, Song
    2020 INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE (ICTC), 2020, : 141 - 146
  • [48] Natural Black-Box Adversarial Examples against Deep Reinforcement Learning
    Yu, Mengran
    Sun, Shiliang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8936 - 8944
  • [49] MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning
    Ma, Zeyuan
    Guo, Hongshu
    Chen, Jiacheng
    Li, Zhenrui
    Peng, Guojun
    Gong, Yue-Jiao
    Ma, Yining
    Cao, Zhiguang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Explaining intersectionality through description, counterfactual thinking, and mediation analysis
    John W. Jackson
    Social Psychiatry and Psychiatric Epidemiology, 2017, 52 : 785 - 793