Rethinking Value Function Learning for Generalization in Reinforcement Learning

被引:0
|
作者
Moon, Seungyong [1 ,2 ]
Lee, JunYeong [1 ,2 ]
Song, Hyun Oh [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] Neural Proc Res Ctr, Seoul, South Korea
[3] DeepMetrics, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture to avoid interference and obtain a more accurate value function. We identify that a value network in the multi-environment setting is more challenging to optimize and prone to memorizing the training data than in the conventional single-environment setting. In addition, we find that appropriate regularization on the value network is necessary to improve both training and test performance. To this end, we propose Delayed-Critic Policy Gradient (DCPG), a policy gradient algorithm that implicitly penalizes value estimates by optimizing the value network less frequently with more training data than the policy network. This can be implemented using a single unified network architecture. Furthermore, we introduce a simple self-supervised task that learns the forward and inverse dynamics of environments using a single discriminator, which can be jointly optimized with the value network. Our proposed algorithms significantly improve observational generalization performance and sample efficiency on the Procgen Benchmark.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization
    Wen, Zheng
    Van Roy, Benjamin
    MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (03) : 762 - 782
  • [2] Generalization of value in reinforcement learning by humans
    Wimmer, G. Elliott
    Daw, Nathaniel D.
    Shohamy, Daphna
    EUROPEAN JOURNAL OF NEUROSCIENCE, 2012, 35 (07) : 1092 - 1104
  • [3] Decoupling Value and Policy for Generalization in Reinforcement Learning
    Raileanu, Roberta
    Fergus, Rob
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] The Value Function Polytope in Reinforcement Learning
    Dadashi, Robert
    Taiga, Adrien Ali
    Le Roux, Nicolas
    Schuurmans, Dale
    Bellemare, Marc G. L.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [5] Policy Optimization with Augmented Value Targets for Generalization in Reinforcement Learning
    Nafi, Nasik Muhammad
    Poggi-Corradini, Giovanni
    Hsu, William
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Learning Dynamics and Generalization in Deep Reinforcement Learning
    Lyle, Clare
    Rowland, Mark
    Dabney, Will
    Kwiatkowksa, Marta
    Gal, Yarin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Supervised Reinforcement Learning via Value Function
    Pan, Yaozong
    Zhang, Jian
    Yuan, Chunhui
    Yang, Haitao
    SYMMETRY-BASEL, 2019, 11 (04):
  • [8] Robust Reinforcement Learning with a Stochastic Value Function
    Hatsugai, Reiji
    Inaba, Mary
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 519 - 526
  • [9] On the Generalization of Representations in Reinforcement Learning
    Le Lan, Charline
    Tu, Stephen
    Oberman, Adam
    Agarwal, Rishabh
    Bellemare, Marc
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [10] Quantifying Generalization in Reinforcement Learning
    Cobbe, Karl
    Klimov, Oleg
    Hesse, Chris
    Kim, Taehoon
    Schulman, John
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97