Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model

被引:0
|
作者
Li, Gen [1 ]
Chi, Yuejie [2 ]
Wei, Yuting [1 ]
Chen, Yuxin [1 ]
机构
[1] UPenn, Philadelphia, PA 19104 USA
[2] CMU, Pittsburgh, PA USA
关键词
COMPLEXITY; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called Q-FTRL and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an epsilon-approximate CCE in a general-sum Markov game using (O) over tilde ((HS)-S-4 Sigma(m)(i=1) A(i)/epsilon(2)) samples, where m is the number of players, S indicates the number of states, H is the horizon, and A(i) denotes the number of actions for the i-th player. This is minimax-optimal (up to log factor) when m is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an epsilon-approximate Nash equilibrium with a minimal number of samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Multi-Agent Reinforcement Learning in Cournot Games
    Shi, Yuanyuan
    Zhang, Baosen
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 3561 - 3566
  • [32] Research progress of multi-agent learning in games
    Luo J.
    Zhang W.
    Su J.
    Yuan W.
    Chen J.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2024, 46 (05): : 1628 - 1655
  • [33] Conditional Imitation Learning for Multi-Agent Games
    Shih, Andy
    Ermon, Stefano
    Sadigh, Dorsa
    PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, : 166 - 175
  • [34] Exposing Transmitters in Mobile Multi-Agent Games
    Bessos, Mai Ben-Adar
    Birnbach, Simon
    Herzberg, Amir
    Martinovic, Ivan
    CPS-SPC'16: PROCEEDINGS OF THE 2ND ACM WORKSHOP ON CYBER-PHYSICAL SYSTEMS SECURITY & PRIVACY, 2016, : 125 - 136
  • [35] Multi-agent algorithms for solving graphical games
    Vickrey, D
    Koller, D
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 345 - 351
  • [36] Diverse Generation for Multi-agent Sports Games
    Yeh, Raymond A.
    Schwing, Alexander G.
    Huang, Jonathan
    Murphy, Kevin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4605 - 4614
  • [37] Pareto Optimal Allocation in Multi-agent Coalitional Games with Non-linear Payoffs
    Sridhar, Usha
    Mandyam, Sridhar
    2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2012, : 1301 - 1308
  • [38] Multi-agent Markov Decision Processes with limited agent communication
    Mukhopadhyay, S
    Jain, B
    PROCEEDINGS OF THE 2001 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL (ISIC'01), 2001, : 7 - 12
  • [39] Multi-agent Continuous Control with Generative Flow Networks
    Luo, Shuang
    Li, Yinchuan
    Liu, Shunyu
    Zhang, Xu
    Shao, Yunfeng
    Wu, Chao
    NEURAL NETWORKS, 2024, 174
  • [40] GENMADEM: A methodology for generative multi-agent domain engineering
    Jansen, Mauro
    Girardi, Rosario
    REUSE OF OFF-THE-SHELF COMPONENTS, PROCEEDINGS, 2006, 4039 : 399 - 402