Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引：0

作者：

Qiu, Shuang ^{[1
]}

Wei, Xiaohan ^{[2
]}

Ye, Jieping ^{[1
]}

Wang, Zhaoran ^{[3
]}

Yang, Zhuoran ^{[4
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Facebook Inc, Menlo Pk, CA 94025 USA

[3] Northwestern Univ, Evanston, IL 60208 USA

[4] Princeton Univ, Princeton, NJ 08544 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

美国国家科学基金会;

关键词：

CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).

引用

页数：11

共 50 条

[1] Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games
Zhao, Yulai
Tian, Yuandong
Lee, Jason D.
Du, Simon S.
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[2] FICTITIOUS PLAY IN ZERO-SUM STOCHASTIC GAMES
Sayin, Muhammed O.
Parise, Francesca
Ozdaglar, Asuman
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2022, 60 (04) : 2095 - 2114
[3] Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
Zhang, Kaiqing
Yang, Zhuoran
Basar, Tamer
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Smooth policy iteration for zero-sum Markov Games
Lyu, Yao
Wang, Wenxuan
Li, Shengbo Eben
Li, Zeyang
Duan, Jingliang
NEUROCOMPUTING, 2025, 630
[5] A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
Xiong, Wei
Zhong, Han
Shi, Chengshuai
Shen, Cong
Zhang, Tong
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] ZERO-SUM MARKOV GAMES WITH IMPULSE CONTROLS
Basu, Arnab
Stettner, Lukasz
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2020, 58 (01) : 580 - 604
[7] Zero-sum semi-Markov games
Jaskiewicz, A
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2002, 41 (03) : 723 - 739
[8] Policy Gradient Algorithm in Two-Player Zero-Sum Markov Games
Li Y.
Zhou J.
Feng Y.
Feng Y.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (01): : 81 - 91
[9] Hamiltonian flows with random-walk behaviour originating from zero-sum games and fictitious play
van Strien, Sebastian
NONLINEARITY, 2011, 24 (06) : 1715 - 1742
[10] ZERO-SUM MARKOV GAMES WITH STOPPING AND IMPULSIVE STRATEGIES
STETTNER, L
APPLIED MATHEMATICS AND OPTIMIZATION, 1982, 9 (01): : 1 - 24

← 1 2 3 4 5 →