Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引:0
|
作者
Qiu, Shuang [1 ]
Wei, Xiaohan [2 ]
Ye, Jieping [1 ]
Wang, Zhaoran [3 ]
Yang, Zhuoran [4 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Facebook Inc, Menlo Pk, CA 94025 USA
[3] Northwestern Univ, Evanston, IL 60208 USA
[4] Princeton Univ, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Partially observed semi-Markov zero-sum games with average payoff
    Ghosh, Mrinal K.
    Goswami, Anindya
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 345 (01) : 26 - 39
  • [32] Bias and overtaking equilibria for zero-sum continuous-time Markov games
    Tomás Prieto-Rumeau
    Onésimo Hernández-Lerma
    Mathematical Methods of Operations Research, 2005, 61 : 437 - 454
  • [33] Bias and overtaking equilibria for zero-sum continuous-time Markov games
    Prieto-Rumeau, T
    Hernández-Lerma, O
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2005, 61 (03) : 437 - 454
  • [34] Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
    Yan, Yuling
    Li, Gen
    Chen, Yuxin
    Fan, Jianqing
    OPERATIONS RESEARCH, 2024, 72 (06) : 2430 - 2445
  • [35] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
    Yurii Averboukh
    Dynamic Games and Applications, 2017, 7 : 1 - 20
  • [36] Two-person zero-sum Markov games: Receding horizon approach
    Chang, HS
    Marcus, SI
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (11) : 1951 - 1961
  • [37] When are Offline Two-Player Zero-Sum Markov Games Solvable?
    Cui, Qiwen
    Du, Simon S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [38] Discrete-time zero-sum Markov games with first passage criteria
    Liu, Qiuli
    Huang, Xiangxiang
    OPTIMIZATION, 2017, 66 (04) : 571 - 587
  • [39] Value set iteration for two-person zero-sum Markov games
    Chang, Hyeong Soo
    AUTOMATICA, 2017, 76 : 61 - 64
  • [40] Two person zero-sum Markov games with partial information on action spaces
    Kai, Y
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XI, PROCEEDINGS: COMPUTER SCIENCE II, 2002, : 565 - 566