Exploring selfish reinforcement learning in repeated games with stochastic rewards

被引:0
|
作者
Katja Verbeeck
Ann Nowé
Johan Parent
Karl Tuyls
机构
[1] Vrije Universiteit Brussel,Computational Modeling Lab (COMO)
[2] University of Maastricht,Institute for Knowledge and Agent Technology (IKAT)
关键词
Multi-agent reinforcement learning; Learning automata; Non-zero sum games;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.
引用
收藏
页码:239 / 269
页数:30
相关论文
共 50 条
  • [1] Exploring selfish reinforcement learning in repeated games with stochastic rewards
    Verbeeck, Katja
    Nowe, Ann
    Parent, Johan
    Tuyls, Karl
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 14 (03) : 239 - 269
  • [2] Evolutionary instability of selfish learning in repeated games
    McAvoy, Alex
    Kates-Harbeck, Julian
    Chatterjee, Krishnendu
    Hilbe, Christian
    [J]. PNAS NEXUS, 2022, 1 (04):
  • [3] RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation
    Parnell, Jacob
    Unanue, Inigo Jauregi
    Piccardi, Massimo
    [J]. SPNLP 2021: THE 5TH WORKSHOP ON STRUCTURED PREDICTION FOR NLP, 2021, : 1 - 11
  • [4] Online Reinforcement Learning in Stochastic Games
    Wei, Chen-Yu
    Hong, Yi-Te
    Lu, Chi-Jen
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [5] Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards
    Pena, Luis
    LaTorre, Antonio
    Pena, Jose-Maria
    Ossowski, Sascha
    [J]. HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2009, 5572 : 336 - 343
  • [6] Learning in repeated stochastic network aggregative games
    Meigs, Emily
    Parise, Francesca
    Ozdaglar, Asuman
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 6918 - 6923
  • [7] A reinforcement learning approach to stochastic business games
    Ravulapati, KK
    Rao, J
    Das, TK
    [J]. IIE TRANSACTIONS, 2004, 36 (04) : 373 - 385
  • [8] Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
    Crandall, Jacob W.
    Goodrich, Michael A.
    [J]. MACHINE LEARNING, 2011, 82 (03) : 281 - 314
  • [9] Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning
    Jacob W. Crandall
    Michael A. Goodrich
    [J]. Machine Learning, 2011, 82 : 281 - 314
  • [10] Recursive stochastic games with positive rewards
    Etessami, Kousha
    Wojtczak, Dominik
    Yannakakis, Mihalis
    [J]. THEORETICAL COMPUTER SCIENCE, 2019, 777 : 308 - 328