Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards

被引:0
|
作者
Devidze, Rati [1 ]
Kamalaruban, Parameswaran [2 ]
Singla, Adish [1 ]
机构
[1] Max Planck Inst Software Syst MPI SWS, Saarbrucken, Germany
[2] Alan Turing Inst, London, England
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent. Existing works have considered a number of different reward shaping formulations; however, they either require external domain knowledge or fail in environments with extremely sparse rewards. In this paper, we propose a novel framework, Exploration-Guided Reward Shaping (EXPLORS), that operates in a fully self-supervised manner and can accelerate an agent's learning even in sparse-reward environments. The key idea of EXPLORS is to learn an intrinsic reward function in combination with exploration-based bonuses to maximize the agent's utility w.r.t. extrinsic rewards. We theoretically showcase the usefulness of our reward shaping framework in a special family of MDPs. Experimental results on several environments with sparse/noisy reward signals demonstrate the effectiveness of EXPLORS.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    [J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [2] Online learning of shaping rewards in reinforcement learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
  • [3] Intermittent Reinforcement Learning with Sparse Rewards
    Sahoo, Prachi Pratyusha
    Vamvoudakis, Kyriakos G.
    [J]. 2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2709 - 2714
  • [4] Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
    Hu, Yujing
    Wang, Weixun
    Jia, Hangtian
    Wang, Yixiang
    Chen, Yingfeng
    Hao, Jianye
    Wu, Feng
    Fan, Changjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Offline Reinforcement Learning with Failure Under Sparse Reward Environments
    Wu, Mingkang
    Siddique, Umer
    Sinha, Abhinav
    Cao, Yongcan
    [J]. 2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [6] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [7] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [8] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    [J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [9] Collaborative Exploration and Reinforcement Learning between Heterogeneously Skilled Agents in Environments with Sparse Rewards
    Andres, Alain
    Villar-Rodriguez, Esther
    Martinez, Aritz D.
    Del Ser, Javier
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
    Liu, Xi
    Ma, Long
    Chen, Zhen
    Zheng, Changgang
    Chen, Ren
    Liao, Yong
    Yang, Shufan
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221