Counterfactual Credit Assignment in Model-Free Reinforcement Learning

被引:0
|
作者
Mesnard, Thomas [1 ]
Weber, Theophane [1 ]
Viola, Fabio [1 ]
Thakoor, Shantanu [1 ]
Saade, Alaa [1 ]
Harutyunyan, Anna [1 ]
Dabney, Will [1 ]
Stepleton, Tom [1 ]
Heess, Nicolas [1 ]
Guez, Arthur [1 ]
Moulines, Eric [2 ]
Hutter, Marcus [1 ]
Buesing, Lars [1 ]
Munos, Remi [1 ]
机构
[1] DeepMind, London, England
[2] Ecole Polytech, CMAP, INRIA XPOP, Palaiseau, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on future events, by learning to extract relevant information from a trajectory. We formulate a family of policy gradient algorithms that use these future-conditional value functions as baselines or critics, and show that they are provably low variance. To avoid the potential bias from conditioning on future information, we constrain the hindsight information to not contain information about the agent's actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative and challenging problems.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Model-Free Neural Counterfactual Regret Minimization With Bootstrap Learning
    Liu, Weiming
    Li, Bin
    Togelius, Julian
    [J]. IEEE TRANSACTIONS ON GAMES, 2023, 15 (03) : 315 - 325
  • [2] Learning Representations in Model-Free Hierarchical Reinforcement Learning
    Rafati, Jacob
    Noelle, David C.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10009 - 10010
  • [3] Online Nonstochastic Model-Free Reinforcement Learning
    Ghai, Udaya
    Gupta, Arushi
    Xia, Wenhan
    Singh, Karan
    Hazan, Elad
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [5] Model-Free Quantum Control with Reinforcement Learning
    Sivak, V. V.
    Eickbusch, A.
    Liu, H.
    Royer, B.
    Tsioutsios, I
    Devoret, M. H.
    [J]. PHYSICAL REVIEW X, 2022, 12 (01)
  • [6] Model-Free Active Exploration in Reinforcement Learning
    Russo, Alessio
    Proutiere, Alexandre
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Recovering Robustness in Model-Free Reinforcement Learning
    Venkataraman, Harish K.
    Seiler, Peter J.
    [J]. 2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4210 - 4216
  • [8] Model-Free Reinforcement Learning Algorithms: A Survey
    Calisir, Sinan
    Pehlivanoglu, Meltem Kurt
    [J]. 2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [9] Retrospective model-based inference guides model-free credit assignment
    Rani Moran
    Mehdi Keramati
    Peter Dayan
    Raymond J. Dolan
    [J]. Nature Communications, 10
  • [10] Retrospective model-based inference guides model-free credit assignment
    Moran, Rani
    Keramati, Mehdi
    Dayan, Peter
    Dolan, Raymond J.
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)