Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

被引：0

作者：

Li, Gen ^{[1
]}

Chi, Yuejie ^{[2
]}

Wei, Yuting ^{[1
]}

Chen, Yuxin ^{[1
]}

机构：

[1] UPenn, Philadelphia, PA 19104 USA

[2] CMU, Pittsburgh, PA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

COMPLEXITY; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an epsilon-approximate CCE in a general-sum Markov game using (O) over tilde ((HS)-S-4 Sigma(m)(i=1) A(i)/epsilon(2)) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A(i) denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when m is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an epsilon-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.

引用

页数：15

共 50 条

[21] Multi-Agent Diverse Generative Adversarial Networks
Ghosh, Arnab
Kulharia, Viveka
Namboodiri, Vinay
Torr, Philip H. S.
Dokania, Puneet K.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8513 - 8521
[22] A Generic Agent Architecture for Cooperative Multi-agent Games
Marinheiro, Joao
Cardoso, Henrique Lopes
ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2017, : 107 - 118
[23] A generative approach for Multi-Agent System development
Kulesza, U
Garcia, A
Lucena, C
Alencar, P
SOFTWARE ENGINEERING FOR MULTI-AGENT SYSTEMS III: RESEARCH ISSUES AND PRACTICAL APPLICATIONS, 2004, 3390 : 52 - 69
[24] Multi-agent reinforcement learning with approximate model learning for competitive games
Park, Young Joon
Cho, Yoon Sang
Kim, Seoung Bum
PLOS ONE, 2019, 14 (09):
[25] Reinforcement learning model based on regret for multi-agent conflict games
Department of Computer and Information Technology, Fudan University, Shanghai 200433, China
Ruan Jian Xue Bao, 2008, 11 (2957-2967):
[26] Model Predictive Mean Field Games for Controlling Multi-Agent Systems
Inoue, Daisuke
Ito, Yuji
Kashiwabara, Takahito
Saito, Norikazu
Yoshida, Hiroaki
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 982 - 987
[27] Utility of Doctrine with Multi-agent RL for Military Engagements
Basak, Anjon
Zaroukian, Erin G.
Corder, Kevin
Fernandez, Rolando
Hsu, Christopher D.
Sharma, Piyush K.
Waytowich, Nicholas R.
Asher, Derrik E.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV, 2022, 12113
[28] Decentralized Optimal Multi-agent System Tracking Control Using Mean Field Games with Heterogeneous Agent
Zhou, Zejian
Xu, Hao
5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 97 - 102
[29] Communication in multi-agent Markov decision processes
Xuan, P
Lesser, V
Zilberstein, S
FOURTH INTERNATIONAL CONFERENCE ON MULTIAGENT SYSTEMS, PROCEEDINGS, 2000, : 467 - 468
[30] Finding Friend and Foe in Multi-Agent Games
Serrino, Jack
Kleiman-Weiner, Max
Parkes, David C.
Tenenbaum, Joshua B.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →