Reward Prediction for Representation Learning and Reward Shaping

被引:2
|
作者
Hlynsson, Hlynur David [1 ]
Wiskott, Laurenz [1 ]
机构
[1] Ruhr Univ Bochum, Inst Neuroinformat, Univ Str 150, Bochum, Germany
关键词
Reinforcement Learning; Representation Learning; Deep Learning; Machine Learning;
D O I
10.5220/0010640200003063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in an environment with a single terminating goal state. We augment the training of out-of-the-box RL agents in single-goal environments with visual inputs by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to facilitate faster learning of Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization.
引用
收藏
页码:267 / 276
页数:10
相关论文
共 50 条
  • [1] Learning Robust Representation for Reinforcement Learning with Distractions by Reward Sequence Prediction
    Zhou, Qi
    Wang, Jie
    Liu, Qiyuan
    Kuang, Yufei
    Zhou, Wengang
    Li, Houqiang
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2551 - 2562
  • [2] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [3] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    [J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [4] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [5] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    [J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [6] Reward Shaping for Reinforcement Learning by Emotion Expressions
    Hwang, K. S.
    Ling, J. L.
    Chen, Yu-Ying
    Wang, Wei-Han
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1288 - 1293
  • [7] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    [J]. 2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [8] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    [J]. IEEE ACCESS, 2021, 9 : 67259 - 67267
  • [9] Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
    Hu, Yujing
    Wang, Weixun
    Jia, Hangtian
    Wang, Yixiang
    Chen, Yingfeng
    Hao, Jianye
    Wu, Feng
    Fan, Changjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Systems Neuroscience: Shaping the Reward Prediction Error Signal
    Stauffer, William R.
    [J]. CURRENT BIOLOGY, 2015, 25 (22) : R1081 - R1084