A multiagent reinforcement learning algorithm with non-linear dynamics

被引:0
|
作者
Abdallah, Sherief [1 ]
Lesser, Victor [2 ]
机构
[1] Faculty of Informatics, British University in Dubai United Arab Emirates, University of Edinburgh, United Kingdom
[2] Department of Computer Science, University of Massachusetts Amherst, United States
关键词
Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents' decisions. Due to the complexity of the problem; the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called theWeighted Policy Learner (WPL); which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL; the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore; WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapley's game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore; we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge; but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPL's convergence is difficult; because of the non-linear nature of WPL's dynamics; unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead; we numerically solve WPL's dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms. © 2008 AI Access Foundation. All rights reserved;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:521 / 549
相关论文
共 50 条
  • [1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics
    Abdallah, Sherief
    Lesser, Victor
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 : 521 - 549
  • [2] An improved multiagent reinforcement learning algorithm
    Meng, XP
    Babuska, R
    Busoniu, L
    Chen, Y
    Tan, WY
    [J]. 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 337 - 343
  • [3] Gaussian processes non-linear inverse reinforcement learning
    Qiao, Qifeng
    Lin, Xiaomin
    [J]. IET CYBER-SYSTEMS AND ROBOTICS, 2021, 3 (02) : 150 - 163
  • [4] The dynamics of reinforcement learning in cooperative multiagent systems
    Claus, C
    Boutilier, C
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 746 - 752
  • [5] The quantum cartpole: A benchmark environment for non-linear reinforcement learning
    Meinerz, Kai
    Trebst, Simon
    Rudner, Mark
    van Nieuwenburg, Evert
    [J]. SCIPOST PHYSICS CORE, 2024, 7 (02):
  • [6] Gaussian Based Non-linear Function Approximation for Reinforcement Learning
    Haider A.
    Hawe G.
    Wang H.
    Scotney B.
    [J]. SN Computer Science, 2021, 2 (3)
  • [7] A generalized algorithm framework for non-linear structural dynamics
    Papazafeiropoulos, George
    Plevris, Vagelis
    Papadrakakis, Manolis
    [J]. BULLETIN OF EARTHQUAKE ENGINEERING, 2017, 15 (01) : 411 - 441
  • [8] A generalized algorithm framework for non-linear structural dynamics
    George Papazafeiropoulos
    Vagelis Plevris
    Manolis Papadrakakis
    [J]. Bulletin of Earthquake Engineering, 2017, 15 : 411 - 441
  • [9] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
    Kim, Dong-Ki
    Liu, Miao
    Riemer, Matthew
    Sun, Chuangchuang
    Abdulhai, Marwa
    Habibi, Golnaz
    Lopez-Cot, Sebastian
    Tesauro, Gerald
    How, Jonathan P.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [10] A new hybrid learning algorithm for non-linear boundaries
    Wang, CH
    Hong, TP
    Tseng, SS
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (02) : 305 - 325