Learning Dynamics and Generalization in Deep Reinforcement Learning

被引:0
|
作者
Lyle, Clare [1 ]
Rowland, Mark [2 ]
Dabney, Will [2 ]
Kwiatkowksa, Marta [1 ]
Gal, Yarin [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] DeepMind, London, England
基金
欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training, and at the same time induces the second-order effect of discouraging generalization. We corroborate these findings in deep RL agents trained on a range of environments, finding that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly initialized networks and networks trained with policy gradient methods. Finally, we investigate how post-training policy distillation may avoid this pitfall, and show that this approach improves generalization to novel environments in the ProcGen suite and improves robustness to input perturbations.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning
    Hanjie, Austin W.
    Zhong, Victor
    Narasimhan, Karthik
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
    Seo, Younggyo
    Lee, Kimin
    Clavera, Ignasi
    Kurutach, Thanard
    Shin, Jinwoo
    Abbeel, Pieter
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Metrics for Assessing Generalization of Deep Reinforcement Learning in Parameterized Environments
    Aleksandrowicz, Maciej
    Jaworek-Korjakowska, Joanna
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2023, 14 (01) : 45 - 61
  • [4] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
    Miranda, Victor R. F.
    Neto, Armando A.
    Freitas, Gustavo M.
    Mozelli, Leonardo A.
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020
  • [5] Learning and Generalization of Dynamic Movement Primitives by Hierarchical Deep Reinforcement Learning from Demonstration
    Kim, Wonchul
    Lee, Chungkeun
    Kim, H. Jin
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3117 - 3123
  • [6] A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning
    Morales, Eduardo F.
    Murrieta-Cid, Rafael
    Becerra, Israel
    Esquivel-Basaldua, Marco A.
    [J]. INTELLIGENT SERVICE ROBOTICS, 2021, 14 (05) : 773 - 805
  • [7] A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning
    Eduardo F. Morales
    Rafael Murrieta-Cid
    Israel Becerra
    Marco A. Esquivel-Basaldua
    [J]. Intelligent Service Robotics, 2021, 14 : 773 - 805
  • [8] On the Generalization of Representations in Reinforcement Learning
    Le Lan, Charline
    Tu, Stephen
    Oberman, Adam
    Agarwal, Rishabh
    Bellemare, Marc
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [9] Quantifying Generalization in Reinforcement Learning
    Cobbe, Karl
    Klimov, Oleg
    Hesse, Chris
    Kim, Taehoon
    Schulman, John
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [10] The Advance of Reinforcement Learning and Deep Reinforcement Learning
    Lyu, Le
    Shen, Yang
    Zhang, Sicheng
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 644 - 648