Rethinking Value Function Learning for Generalization in Reinforcement Learning

被引:0
|
作者
Moon, Seungyong [1 ,2 ]
Lee, JunYeong [1 ,2 ]
Song, Hyun Oh [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] Neural Proc Res Ctr, Seoul, South Korea
[3] DeepMetrics, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture to avoid interference and obtain a more accurate value function. We identify that a value network in the multi-environment setting is more challenging to optimize and prone to memorizing the training data than in the conventional single-environment setting. In addition, we find that appropriate regularization on the value network is necessary to improve both training and test performance. To this end, we propose Delayed-Critic Policy Gradient (DCPG), a policy gradient algorithm that implicitly penalizes value estimates by optimizing the value network less frequently with more training data than the policy network. This can be implemented using a single unified network architecture. Furthermore, we introduce a simple self-supervised task that learns the forward and inverse dynamics of environments using a single discriminator, which can be jointly optimized with the value network. Our proposed algorithms significantly improve observational generalization performance and sample efficiency on the Procgen Benchmark.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
    MacGlashan, James
    Archer, Evan
    Devlic, Alisa
    Seno, Takuma
    Sherstan, Craig
    Wurman, Peter R.
    Stone, Peter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [32] Value function based reinforcement learning in changing Markovian environments
    Computer and Automation Research Institute, Hungarian Academy of Sciences, Kende utca 13-17, Budapest, H-1111, Hungary
    不详
    J. Mach. Learn. Res., 2008, (1679-1709):
  • [33] Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
    Foster, Dylan J.
    Krishnamurthy, Akshay
    Simchi-Levi, David
    Xu, Yunzong
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [34] Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation
    Wang, Kai
    Zou, Zhene
    Deng, Qilin
    Tao, Jianrong
    Wu, Runze
    Fan, Changjie
    Chen, Liang
    Cui, Peng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4427 - 4435
  • [35] Value Function Based Reinforcement Learning in Changing Markovian Environments
    Csaji, Balazs Csanad
    Monostori, Laszlo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 1679 - 1709
  • [36] Value Function Evaluation with Data Augmentation for Offline Reinforcement Learning
    Zhou, Xianwei
    Zhang, Chulue
    Lin, Yifan
    Yu, Songsen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14863 : 432 - 442
  • [37] Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
    Tan, Tian
    Xiong, Zhihan
    Dwaracherla, Vikranth R.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5948 - 5955
  • [38] Incremental State Aggregation for Value Function Estimation in Reinforcement Learning
    Mori, Takeshi
    Ishii, Shin
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (05): : 1407 - 1416
  • [39] Quantum reinforcement learning method and application based on value function
    Liu, Yi-Pei
    Jia, Qing-Shan
    Wang, Xu
    IFAC PAPERSONLINE, 2022, 55 (11): : 132 - 137
  • [40] Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1270 - 1280