Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引:0
|
作者
Qiu, Shuang [1 ]
Wei, Xiaohan [2 ]
Ye, Jieping [1 ]
Wang, Zhaoran [3 ]
Yang, Zhuoran [4 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Facebook Inc, Menlo Pk, CA 94025 USA
[3] Northwestern Univ, Evanston, IL 60208 USA
[4] Princeton Univ, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
    Zhao, Yulai
    Tian, Yuandong
    Lee, Jason D.
    Du, Simon S.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [2] FICTITIOUS PLAY IN ZERO-SUM STOCHASTIC GAMES
    Sayin, Muhammed O.
    Parise, Francesca
    Ozdaglar, Asuman
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2022, 60 (04) : 2095 - 2114
  • [3] Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
    Zhang, Kaiqing
    Yang, Zhuoran
    Basar, Tamer
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Smooth policy iteration for zero-sum Markov Games
    Lyu, Yao
    Wang, Wenxuan
    Li, Shengbo Eben
    Li, Zeyang
    Duan, Jingliang
    NEUROCOMPUTING, 2025, 630
  • [5] A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
    Xiong, Wei
    Zhong, Han
    Shi, Chengshuai
    Shen, Cong
    Zhang, Tong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] ZERO-SUM MARKOV GAMES WITH IMPULSE CONTROLS
    Basu, Arnab
    Stettner, Lukasz
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (01) : 580 - 604
  • [7] Zero-sum semi-Markov games
    Jaskiewicz, A
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2002, 41 (03) : 723 - 739
  • [8] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
    Li Y.
    Zhou J.
    Feng Y.
    Feng Y.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
  • [9] Hamiltonian flows with random-walk behaviour originating from zero-sum games and fictitious play
    van Strien, Sebastian
    NONLINEARITY, 2011, 24 (06) : 1715 - 1742
  • [10] ZERO-SUM MARKOV GAMES WITH STOPPING AND IMPULSIVE STRATEGIES
    STETTNER, L
    APPLIED MATHEMATICS AND OPTIMIZATION, 1982, 9 (01): : 1 - 24