Model gradient: unified model and policy learning in model-based reinforcement learning

被引:0
|
作者
Chengxing Jia
Fuxiang Zhang
Tian Xu
Jing-Cheng Pang
Zongzhang Zhang
Yang Yu
机构
[1] Nanjing University,National Key Laboratory for Novel Software Technology
[2] Polixir Technologies,undefined
来源
关键词
reinforcement learning; model-based reinforcement learning; Markov decision process;
D O I
暂无
中图分类号
学科分类号
摘要
Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment. Previous model learning methods aim at fitting the transition data, and commonly employ a supervised learning approach to minimize the distance between the predicted state and the real state. The supervised model learning methods, however, diverge from the ultimate goal of model learning, i.e., optimizing the learned-in-the-model policy. In this work, we investigate how model learning and policy learning can share the same objective of maximizing the expected return in the real environment. We find model learning towards this objective can result in a target of enhancing the similarity between the gradient on generated data and the gradient on the real data. We thus derive the gradient of the model from this target and propose the Model Gradient algorithm (MG) to integrate this novel model learning approach with policy-gradient-based policy optimization. We conduct experiments on multiple locomotion control tasks and find that MG can not only achieve high sample efficiency but also lead to better convergence performance compared to traditional model-based reinforcement learning approaches.
引用
收藏
相关论文
共 50 条
  • [1] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [2] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [3] A survey on model-based reinforcement learning
    Fan-Ming Luo
    Tian Xu
    Hang Lai
    Xiong-Hui Chen
    Weinan Zhang
    Yang Yu
    [J]. Science China Information Sciences, 2024, 67
  • [4] The ubiquity of model-based reinforcement learning
    Doll, Bradley B.
    Simon, Dylan A.
    Daw, Nathaniel D.
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
  • [5] Model-based Reinforcement Learning: A Survey
    Moerland, Thomas M.
    Broekens, Joost
    Plaat, Aske
    Jonker, Catholijn M.
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
  • [6] Nonparametric model-based reinforcement learning
    Atkeson, CG
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014
  • [7] A survey on model-based reinforcement learning
    Fan-Ming LUO
    Tian XU
    Hang LAI
    Xiong-Hui CHEN
    Weinan ZHANG
    Yang YU
    [J]. Science China(Information Sciences), 2024, 67 (02) : 59 - 84
  • [8] Multiple model-based reinforcement learning
    Doya, K
    Samejima, K
    Katagiri, K
    Kawato, M
    [J]. NEURAL COMPUTATION, 2002, 14 (06) : 1347 - 1369
  • [9] A survey on model-based reinforcement learning
    Luo, Fan-Ming
    Xu, Tian
    Lai, Hang
    Chen, Xiong-Hui
    Zhang, Weinan
    Yu, Yang
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (02)
  • [10] Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding
    Tan, Xiaoyu
    Qu, Chao
    Xiong, Junwu
    Zhang, James
    Qiu, Xihe
    Jin, Yaochu
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2974 - 2986