Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

被引:0
|
作者
Li, Gen [1 ]
Chi, Yuejie [2 ]
Wei, Yuting [1 ]
Chen, Yuxin [1 ]
机构
[1] UPenn, Philadelphia, PA 19104 USA
[2] CMU, Pittsburgh, PA USA
关键词
COMPLEXITY; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an epsilon-approximate CCE in a general-sum Markov game using (O) over tilde ((HS)-S-4 Sigma(m)(i=1) A(i)/epsilon(2)) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A(i) denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when m is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an epsilon-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Multi-Agent Diverse Generative Adversarial Networks
    Ghosh, Arnab
    Kulharia, Viveka
    Namboodiri, Vinay
    Torr, Philip H. S.
    Dokania, Puneet K.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8513 - 8521
  • [22] A Generic Agent Architecture for Cooperative Multi-agent Games
    Marinheiro, Joao
    Cardoso, Henrique Lopes
    ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2017, : 107 - 118
  • [23] A generative approach for Multi-Agent System development
    Kulesza, U
    Garcia, A
    Lucena, C
    Alencar, P
    SOFTWARE ENGINEERING FOR MULTI-AGENT SYSTEMS III: RESEARCH ISSUES AND PRACTICAL APPLICATIONS, 2004, 3390 : 52 - 69
  • [24] Multi-agent reinforcement learning with approximate model learning for competitive games
    Park, Young Joon
    Cho, Yoon Sang
    Kim, Seoung Bum
    PLOS ONE, 2019, 14 (09):
  • [25] Reinforcement learning model based on regret for multi-agent conflict games
    Department of Computer and Information Technology, Fudan University, Shanghai 200433, China
    Ruan Jian Xue Bao, 2008, 11 (2957-2967):
  • [26] Model Predictive Mean Field Games for Controlling Multi-Agent Systems
    Inoue, Daisuke
    Ito, Yuji
    Kashiwabara, Takahito
    Saito, Norikazu
    Yoshida, Hiroaki
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 982 - 987
  • [27] Utility of Doctrine with Multi-agent RL for Military Engagements
    Basak, Anjon
    Zaroukian, Erin G.
    Corder, Kevin
    Fernandez, Rolando
    Hsu, Christopher D.
    Sharma, Piyush K.
    Waytowich, Nicholas R.
    Asher, Derrik E.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV, 2022, 12113
  • [28] Decentralized Optimal Multi-agent System Tracking Control Using Mean Field Games with Heterogeneous Agent
    Zhou, Zejian
    Xu, Hao
    5TH IEEE CONFERENCE ON CONTROL TECHNOLOGY AND APPLICATIONS (IEEE CCTA 2021), 2021, : 97 - 102
  • [29] Communication in multi-agent Markov decision processes
    Xuan, P
    Lesser, V
    Zilberstein, S
    FOURTH INTERNATIONAL CONFERENCE ON MULTIAGENT SYSTEMS, PROCEEDINGS, 2000, : 467 - 468
  • [30] Finding Friend and Foe in Multi-Agent Games
    Serrino, Jack
    Kleiman-Weiner, Max
    Parkes, David C.
    Tenenbaum, Joshua B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32