Belief Reward Shaping in Reinforcement Learning

被引：0

作者：

Marom, Ofir ^{[1
]}

Rosman, Benjamin ^{[1
,2
]}

机构：

[1] Univ Witwatersrand, Johannesburg, South Africa

[2] CSIR, Pretoria, South Africa

来源：

THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A key challenge in many reinforcement learning problems is delayed rewards, which can significantly slow down learning. Although reward shaping has previously been introduced to accelerate learning by bootstrapping an agent with additional information, this can lead to problems with convergence. We present a novel Bayesian reward shaping framework that augments the reward distribution with prior beliefs that decay with experience. Formally, we prove that under suitable conditions a Markov decision process augmented with our framework is consistent with the optimal policy of the original MDP when using the Q-learning algorithm. However, in general our method integrates seamlessly with any reinforcement learning algorithm that learns a value or action-value function through experience. Experiments are run on a gridworld and a more complex backgammon domain that show that we can learn tasks significantly faster when we specify intuitive priors on the reward distribution.t

引用

页码：3762 / 3769

页数：8

共 50 条

[1] Multigrid Reinforcement Learning with Reward Shaping
Grzes, Marek
Kudenko, Daniel
[J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
[2] Reward Shaping in Episodic Reinforcement Learning
Grzes, Marek
[J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
[3] Reward Shaping for Reinforcement Learning by Emotion Expressions
Hwang, K. S.
Ling, J. L.
Chen, Yu-Ying
Wang, Wei-Han
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1288 - 1293
[4] Hindsight Reward Shaping in Deep Reinforcement Learning
de Villiers, Byron
Sabatta, Deon
[J]. 2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
[5] Reward Shaping Based Federated Reinforcement Learning
Hu, Yiqiu
Hua, Yun
Liu, Wenyan
Zhu, Jun
[J]. IEEE ACCESS, 2021, 9 : 67259 - 67267
[6] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
Yang, Yulong
Cao, Weihua
Guo, Linwei
Gan, Chao
Wu, Min
[J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
[7] Using Natural Language for Reward Shaping in Reinforcement Learning
Goyal, Prasoon
Niekum, Scott
Mooney, Raymond J.
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2385 - 2391
[8] Plan-based Reward Shaping for Reinforcement Learning
Grzes, Marek
Kudenko, Daniel
[J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
[9] Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning
Grzes, Marek
Kudenko, Daniel
[J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 337 - 344
[10] Reinforcement online learning to rank with unbiased reward shaping
Zhuang, Shengyao
Qiao, Zhihao
Zuccon, Guido
[J]. INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04): : 386 - 413

← 1 2 3 4 5 →