Plan-based reward shaping for multi-agent reinforcement learning

被引:14
|
作者
Devlin, Sam [1 ]
Kudenko, Daniel [1 ]
机构
[1] Univ York, Dept Comp Sci, York YO10 5GH, N Yorkshire, England
来源
KNOWLEDGE ENGINEERING REVIEW | 2016年 / 31卷 / 01期
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1017/S0269888915000181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL. Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.
引用
收藏
页码:44 / 58
页数:15
相关论文
共 50 条
  • [1] Plan-based Reward Shaping for Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
  • [2] Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
    Mannion, Patrick
    Devlin, Sam
    Duggan, Jim
    Howley, Enda
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2018, 33
  • [3] Multi-Agent Reinforcement Learning with Reward Delays
    Zhang, Yuyang
    Zhang, Runyu
    Gu, Yuantao
    Li, Na
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [4] Direct reward and indirect reward in multi-agent reinforcement learning
    Ohta, M
    [J]. ROBOCUP 2002: ROBOT SOCCER WORLD CUP VI, 2003, 2752 : 359 - 366
  • [5] A comparison of plan-based and abstract MDP reward shaping
    Efthymiadis, Kyriakos
    Kudenko, Daniel
    [J]. CONNECTION SCIENCE, 2014, 26 (01) : 85 - 99
  • [6] Rationality of reward sharing in multi-agent reinforcement learning
    Miyazaki, K
    Kobayashi, S
    [J]. NEW GENERATION COMPUTING, 2001, 19 (02) : 157 - 172
  • [7] Rationality of reward sharing in multi-agent reinforcement learning
    Kazuteru Miyazaki
    Shigenobu Kobayashi
    [J]. New Generation Computing, 2001, 19 : 157 - 172
  • [8] Overcoming incorrect knowledge in plan-based reward shaping
    Efthymiadis, Kyriakos
    Devlin, Sam
    Kudenko, Daniel
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 31 - 43
  • [9] Individual Reward Assisted Multi-Agent Reinforcement Learning
    Wang, Li
    Zhang, Yupeng
    Hu, Yujing
    Wang, Weixun
    Zhang, Chongjie
    Gao, Yang
    Hao, Jianye
    Lv, Tangjie
    Fan, Changjie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Shaping multi-agent systems with gradient reinforcement learning
    Olivier Buffet
    Alain Dutech
    François Charpillet
    [J]. Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220