Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

被引:0
|
作者
Li, Gen [1 ]
Chi, Yuejie [2 ]
Wei, Yuting [1 ]
Chen, Yuxin [1 ]
机构
[1] UPenn, Philadelphia, PA 19104 USA
[2] CMU, Pittsburgh, PA USA
关键词
COMPLEXITY; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an epsilon-approximate CCE in a general-sum Markov game using (O) over tilde ((HS)-S-4 Sigma(m)(i=1) A(i)/epsilon(2)) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A(i) denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when m is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an epsilon-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.
引用
下载
收藏
页数:15
相关论文
共 50 条
  • [1] Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
    Zhang, Kaiqing
    Kakade, Sham M.
    Basar, Tamer
    Yang, Lin F.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [2] Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
    Zhang, Kaiqing
    Kakade, Sham M.
    Basar, Tamer
    Yang, Lin F.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Passivity, RL and Learning in Multi-Agent Games
    Pavel, Lacra
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 4383 - 4383
  • [4] A multi-agent coordination framework based on Markov games
    Fan, B
    Pan, Q
    Zhang, HC
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOL 2, 2004, : 230 - 233
  • [5] Bayesian Optimization for Multi-Agent Routing in Markov Games
    Shou, Zhenyu
    Chen, Xu
    Di, Xuan
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 3993 - 3998
  • [6] Learning automata based multi-agent system algorithms for finding optimal policies in Markov games
    Masoumi, Behrooz
    Meybodi, Mohammad Reza
    ASIAN JOURNAL OF CONTROL, 2012, 14 (01) : 137 - 152
  • [7] MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS
    Bunea, Florentina
    Giraud, Christophe
    Luo, Xi
    Royer, Martin
    Verzelen, Nicolas
    ANNALS OF STATISTICS, 2020, 48 (01): : 111 - 137
  • [8] Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning
    Leon, Borja G.
    Belardinelli, Francesco
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 139 - 146
  • [9] A Multi-agent Generative Model for Collaborative Global Routing Refinement
    Wang, Qijing
    Liu, Jinwei
    Wong, Martin D. F.
    Young, Evangeline F. Y.
    PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 383 - 389
  • [10] Optimal distributed containment control for nonlinear multi-agent graphical games
    Yu, Di
    Luo, Huafeng
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 3817 - 3822