Rethinking Value Function Learning for Generalization in Reinforcement Learning

被引:0
|
作者
Moon, Seungyong [1 ,2 ]
Lee, JunYeong [1 ,2 ]
Song, Hyun Oh [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] Neural Proc Res Ctr, Seoul, South Korea
[3] DeepMetrics, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture to avoid interference and obtain a more accurate value function. We identify that a value network in the multi-environment setting is more challenging to optimize and prone to memorizing the training data than in the conventional single-environment setting. In addition, we find that appropriate regularization on the value network is necessary to improve both training and test performance. To this end, we propose Delayed-Critic Policy Gradient (DCPG), a policy gradient algorithm that implicitly penalizes value estimates by optimizing the value network less frequently with more training data than the policy network. This can be implemented using a single unified network architecture. Furthermore, we introduce a simple self-supervised task that learns the forward and inverse dynamics of environments using a single discriminator, which can be jointly optimized with the value network. Our proposed algorithms significantly improve observational generalization performance and sample efficiency on the Procgen Benchmark.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] On the Generalization Gap in Reparameterizable Reinforcement Learning
    Wang, Huan
    Zheng, Stephan
    Xiong, Caiming
    Socher, Richard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [22] On the Importance of Exploration for Generalization in Reinforcement Learning
    Jiang, Yiding
    Kolter, J. Zico
    Raileanu, Roberta
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] A CAUTIOUS APPROACH TO GENERALIZATION IN REINFORCEMENT LEARNING
    Fonteneau, Raphael
    Murphy, Susan A.
    Wehenkel, Louis
    Ernst, Damien
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : 64 - 73
  • [24] Generalization to New Actions in Reinforcement Learning
    Jain, Ayush
    Szot, Andrew
    Lim, Joseph J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [25] GENERALIZATION OF SECONDARY REINFORCEMENT IN DISCRIMINATION LEARNING
    EHRENFREUND, D
    JOURNAL OF COMPARATIVE AND PHYSIOLOGICAL PSYCHOLOGY, 1954, 47 (04): : 311 - 314
  • [26] High Confidence Generalization for Reinforcement Learning
    Kostas, James E.
    Chandak, Yash
    Jordan, Scott M.
    Theocharous, Georgios
    Thomas, Philip S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [27] Rethinking Reinforcement Learning for Cloud Elasticity
    Lolos, Konstantinos
    Konstantinou, Ioannis
    Kantere, Verena
    Koziris, Nectarios
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 648 - 648
  • [28] Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning
    Nafi, Nasik Muhammad
    Glasscock, Creighton
    Hsu, William
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 15 - 22
  • [29] Algebraic Reinforcement Learning Hypothesis Induction for Relational Reinforcement Learning Using Term Generalization
    Neubert, Stefanie
    Belzner, Lenz
    Wirsing, Martin
    LOGIC, REWRITING, AND CONCURRENCY, 2015, 9200 : 562 - 579
  • [30] Coordinating SON Instances: Reinforcement Learning with Distributed Value Function
    Iacoboaiea, Ovidiu
    Sayrac, Berna
    Ben Jemaa, Sana
    Bianchi, Pascal
    2014 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATION (PIMRC), 2014, : 1642 - 1646