A comparison of plan-based and abstract MDP reward shaping

被引:2
|
作者
Efthymiadis, Kyriakos [1 ]
Kudenko, Daniel [1 ]
机构
[1] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England
关键词
reinforcement learning; MDPs; plan-based; reward shaping;
D O I
10.1080/09540091.2014.885283
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reward shaping has been shown to significantly improve an agent's performance in reinforcement learning. As attention is shifting away from tabula-rasa approaches many different reward shaping methods have been developed. In this paper, we compare two different methods for reward shaping; plan-based, in which an agent is provided with a plan and extra rewards are given according to the steps of the plan the agent satisfies, and reward shaping via abstract Markov decision process (MDPs), in which an abstract high-level MDP of the environment is solved and the resulting value function is used to shape the agent. The comparison is conducted in terms of total reward, convergence speed and scaling up to more complex environments. Empirical results demonstrate the need to correctly select and set up reward shaping methods according to the needs of the environment the agents are acting in. This leads to the more interesting question, is there a reward shaping method which is universally better than all other approaches regardless of the environment dynamics?
引用
收藏
页码:85 / 99
页数:16
相关论文
共 50 条
  • [1] Plan-based Reward Shaping for Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
  • [2] Overcoming incorrect knowledge in plan-based reward shaping
    Efthymiadis, Kyriakos
    Devlin, Sam
    Kudenko, Daniel
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 31 - 43
  • [3] Plan-based reward shaping for multi-agent reinforcement learning
    Devlin, Sam
    Kudenko, Daniel
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 44 - 58
  • [4] Plan-based robotic agents
    Beetz, M
    [J]. PLAN-BASED CONTROL OF ROBOTIC AGENTS, 2002, 2554 : 147 - 177
  • [5] Plan-Based Intention Revision
    Amos-Binks, Adam
    Young, R. Michael
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8047 - 8048
  • [6] Plan-Based Expressivism and Innocent Mistakes
    Daskal, Steve
    [J]. ETHICS, 2009, 119 (02) : 310 - 335
  • [7] Plan-based assistance in the webbrowser firefox
    Bertz, Thomas A.
    Reiss, Peter
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 622 - +
  • [8] Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract)
    Forbes, Grant C.
    Roberts, David L.
    [J]. THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23488 - 23489
  • [9] Comparison of Provider- and Plan-Based Targeting Strategies for Disease Management
    Annis, Ann M.
    Holtrop, Jodi Summers
    Tao, Min
    Chang, Hsiu-Ching
    Luo, Zhehui
    [J]. AMERICAN JOURNAL OF MANAGED CARE, 2015, 21 (05): : 344 - 351
  • [10] INTELLIGENT BACKTRACKING IN PLAN-BASED DEDUCTION
    MATWIN, S
    PIETRZYKOWSKI, T
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1985, 7 (06) : 682 - 692