Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

被引：0

作者：

Li, Gen ^{[1
]}

Chi, Yuejie ^{[2
]}

Wei, Yuting ^{[1
]}

Chen, Yuxin ^{[1
]}

机构：

[1] UPenn, Philadelphia, PA 19104 USA

[2] CMU, Pittsburgh, PA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

COMPLEXITY; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an epsilon-approximate CCE in a general-sum Markov game using (O) over tilde ((HS)-S-4 Sigma(m)(i=1) A(i)/epsilon(2)) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A(i) denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when m is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an epsilon-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.

引用

下载

页数：15

共 50 条

[1] Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Zhang, Kaiqing
Kakade, Sham M.
Basar, Tamer
Yang, Lin F.
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[2] Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Zhang, Kaiqing
Kakade, Sham M.
Basar, Tamer
Yang, Lin F.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[3] Passivity, RL and Learning in Multi-Agent Games
Pavel, Lacra
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4383 - 4383
[4] A multi-agent coordination framework based on Markov games
Fan, B
Pan, Q
Zhang, HC
PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOL 2, 2004, : 230 - 233
[5] Bayesian Optimization for Multi-Agent Routing in Markov Games
Shou, Zhenyu
Chen, Xu
Di, Xuan
2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3993 - 3998
[6] Learning automata based multi-agent system algorithms for finding optimal policies in Markov games
Masoumi, Behrooz
Meybodi, Mohammad Reza
ASIAN JOURNAL OF CONTROL, 2012, 14 (01) : 137 - 152
[7] MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS
Bunea, Florentina
Giraud, Christophe
Luo, Xi
Royer, Martin
Verzelen, Nicolas
ANNALS OF STATISTICS, 2020, 48 (01): : 111 - 137
[8] Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning
Leon, Borja G.
Belardinelli, Francesco
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 139 - 146
[9] A Multi-agent Generative Model for Collaborative Global Routing Refinement
Wang, Qijing
Liu, Jinwei
Wong, Martin D. F.
Young, Evangeline F. Y.
PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 383 - 389
[10] Optimal distributed containment control for nonlinear multi-agent graphical games
Yu, Di
Luo, Huafeng
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 3817 - 3822

← 1 2 3 4 5 →