A comparison of plan-based and abstract MDP reward shaping

被引：2

作者：

Efthymiadis, Kyriakos ^{[1
]}

Kudenko, Daniel ^{[1
]}

机构：

[1] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England

来源：

CONNECTION SCIENCE | 2014年 / 26卷 / 01期

关键词：

reinforcement learning; MDPs; plan-based; reward shaping;

D O I：

10.1080/09540091.2014.885283

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper, we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of the plan the agent satisfies, and reward shaping via abstract Markov decision process (MDPs), in which an abstract high-level MDP of the environment is solved and the resulting value function is used to shape the agent. The comparison is conducted in terms of total reward, convergence speed and scaling up to more complex environments. Empirical results demonstrate the need to correctly select and set up reward shaping methods according to the needs of the environment the agents are acting in. This leads to the more interesting question, is there a reward shaping method which is universally better than all other approaches regardless of the environment dynamics?

引用

页码：85 / 99

页数：16

共 50 条

[1] Plan-based Reward Shaping for Reinforcement Learning
Grzes, Marek
Kudenko, Daniel
[J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
[2] Overcoming incorrect knowledge in plan-based reward shaping
Efthymiadis, Kyriakos
Devlin, Sam
Kudenko, Daniel
[J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 31 - 43
[3] Plan-based reward shaping for multi-agent reinforcement learning
Devlin, Sam
Kudenko, Daniel
[J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 44 - 58
[4] Plan-based robotic agents
Beetz, M
[J]. PLAN-BASED CONTROL OF ROBOTIC AGENTS, 2002, 2554 : 147 - 177
[5] Plan-Based Intention Revision
Amos-Binks, Adam
Young, R. Michael
[J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8047 - 8048
[6] Plan-Based Expressivism and Innocent Mistakes
Daskal, Steve
[J]. ETHICS, 2009, 119 (02) : 310 - 335
[7] Plan-based assistance in the webbrowser firefox
Bertz, Thomas A.
Reiss, Peter
[J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 622 - +
[8] Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract)
Forbes, Grant C.
Roberts, David L.
[J]. THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23488 - 23489
[9] Comparison of Provider- and Plan-Based Targeting Strategies for Disease Management
Annis, Ann M.
Holtrop, Jodi Summers
Tao, Min
Chang, Hsiu-Ching
Luo, Zhehui
[J]. AMERICAN JOURNAL OF MANAGED CARE, 2015, 21 (05): : 344 - 351
[10] INTELLIGENT BACKTRACKING IN PLAN-BASED DEDUCTION
MATWIN, S
PIETRZYKOWSKI, T
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1985, 7 (06) : 682 - 692

← 1 2 3 4 5 →