A multiagent reinforcement learning algorithm with non-linear dynamics

被引：0

作者：

Abdallah, Sherief ^{[1
]}

Lesser, Victor ^{[2
]}

机构：

[1] Faculty of Informatics, British University in Dubai United Arab Emirates, University of Edinburgh, United Kingdom

[2] Department of Computer Science, University of Massachusetts Amherst, United States

来源：

Journal of Artificial Intelligence Research | 2008年 / 33卷

关键词：

Several multiagent reinforcement learning (MARL) algorithms have been proposed to optimize agents' decisions. Due to the complexity of the problem; the majority of the previously developed MARL algorithms assumed agents either had some knowledge of the underlying game (such as Nash equilibria) and/or observed other agents actions and the rewards they received. We introduce a new MARL algorithm called theWeighted Policy Learner (WPL); which allows agents to reach a Nash Equilibrium (NE) in benchmark 2-player-2-action games with minimum knowledge. Using WPL; the only feedback an agent needs is its own local reward (the agent does not observe other agents actions or rewards). Furthermore; WPL does not assume that agents know the underlying game or the corresponding Nash Equilibrium a priori. We experimentally show that our algorithm converges in benchmark two-player-two-action games. We also show that our algorithm converges in the challenging Shapley's game where previous MARL algorithms failed to converge without knowing the underlying game or the NE. Furthermore; we show that WPL outperforms the state-of-the-art algorithms in a more realistic setting of 100 agents interacting and learning concurrently. An important aspect of understanding the behavior of a MARL algorithm is analyzing the dynamics of the algorithm: how the policies of multiple learning agents evolve over time as agents interact with one another. Such an analysis not only verifies whether agents using a given MARL algorithm will eventually converge; but also reveals the behavior of the MARL algorithm prior to convergence. We analyze our algorithm in two-player-two-action games and show that symbolically proving WPL's convergence is difficult; because of the non-linear nature of WPL's dynamics; unlike previous MARL algorithms that had either linear or piece-wise-linear dynamics. Instead; we numerically solve WPL's dynamics differential equations and compare the solution to the dynamics of previous MARL algorithms. © 2008 AI Access Foundation. All rights reserved;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Journal article (JA)

引用

页码：521 / 549

共 50 条

[1] A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics
Abdallah, Sherief
Lesser, Victor
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 33 : 521 - 549
[2] An improved multiagent reinforcement learning algorithm
Meng, XP
Babuska, R
Busoniu, L
Chen, Y
Tan, WY
[J]. 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 337 - 343
[3] Gaussian processes non-linear inverse reinforcement learning
Qiao, Qifeng
Lin, Xiaomin
[J]. IET CYBER-SYSTEMS AND ROBOTICS, 2021, 3 (02) : 150 - 163
[4] The dynamics of reinforcement learning in cooperative multiagent systems
Claus, C
Boutilier, C
[J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 746 - 752
[5] The quantum cartpole: A benchmark environment for non-linear reinforcement learning
Meinerz, Kai
Trebst, Simon
Rudner, Mark
van Nieuwenburg, Evert
[J]. SCIPOST PHYSICS CORE, 2024, 7 (02):
[6] Gaussian Based Non-linear Function Approximation for Reinforcement Learning
Haider A.
Hawe G.
Wang H.
Scotney B.
[J]. SN Computer Science, 2021, 2 (3)
[7] A generalized algorithm framework for non-linear structural dynamics
Papazafeiropoulos, George
Plevris, Vagelis
Papadrakakis, Manolis
[J]. BULLETIN OF EARTHQUAKE ENGINEERING, 2017, 15 (01) : 411 - 441
[8] A generalized algorithm framework for non-linear structural dynamics
George Papazafeiropoulos
Vagelis Plevris
Manolis Papadrakakis
[J]. Bulletin of Earthquake Engineering, 2017, 15 : 411 - 441
[9] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Kim, Dong-Ki
Liu, Miao
Riemer, Matthew
Sun, Chuangchuang
Abdulhai, Marwa
Habibi, Golnaz
Lopez-Cot, Sebastian
Tesauro, Gerald
How, Jonathan P.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[10] A new hybrid learning algorithm for non-linear boundaries
Wang, CH
Hong, TP
Tseng, SS
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (02) : 305 - 325

← 1 2 3 4 5 →