Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引：0

作者：

Qiu, Shuang ^{[1
]}

Wei, Xiaohan ^{[2
]}

Ye, Jieping ^{[1
]}

Wang, Zhaoran ^{[3
]}

Yang, Zhuoran ^{[4
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Facebook Inc, Menlo Pk, CA 94025 USA

[3] Northwestern Univ, Evanston, IL 60208 USA

[4] Princeton Univ, Princeton, NJ 08544 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

美国国家科学基金会;

关键词：

CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).

引用

页数：11

共 50 条

[31] Partially observed semi-Markov zero-sum games with average payoff
Ghosh, Mrinal K.
Goswami, Anindya
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2008, 345 (01) : 26 - 39
[32] Bias and overtaking equilibria for zero-sum continuous-time Markov games
Tomás Prieto-Rumeau
Onésimo Hernández-Lerma
Mathematical Methods of Operations Research, 2005, 61 : 437 - 454
[33] Bias and overtaking equilibria for zero-sum continuous-time Markov games
Prieto-Rumeau, T
Hernández-Lerma, O
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2005, 61 (03) : 437 - 454
[34] Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Yan, Yuling
Li, Gen
Chen, Yuxin
Fan, Jianqing
OPERATIONS RESEARCH, 2024, 72 (06) : 2430 - 2445
[35] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
Yurii Averboukh
Dynamic Games and Applications, 2017, 7 : 1 - 20
[36] Two-person zero-sum Markov games: Receding horizon approach
Chang, HS
Marcus, SI
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2003, 48 (11) : 1951 - 1961
[37] When are Offline Two-Player Zero-Sum Markov Games Solvable?
Cui, Qiwen
Du, Simon S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[38] Discrete-time zero-sum Markov games with first passage criteria
Liu, Qiuli
Huang, Xiangxiang
OPTIMIZATION, 2017, 66 (04) : 571 - 587
[39] Value set iteration for two-person zero-sum Markov games
Chang, Hyeong Soo
AUTOMATICA, 2017, 76 : 61 - 64
[40] Two person zero-sum Markov games with partial information on action spaces
Kai, Y
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XI, PROCEEDINGS: COMPUTER SCIENCE II, 2002, : 565 - 566

← 1 2 3 4 5 →