Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引：0

作者：

Qiu, Shuang ^{[1
]}

Wei, Xiaohan ^{[2
]}

Ye, Jieping ^{[1
]}

Wang, Zhaoran ^{[3
]}

Yang, Zhuoran ^{[4
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Facebook Inc, Menlo Pk, CA 94025 USA

[3] Northwestern Univ, Evanston, IL 60208 USA

[4] Princeton Univ, Princeton, NJ 08544 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

美国国家科学基金会;

关键词：

CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).

引用

页数：11

共 50 条

[41] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
Averboukh, Yurii
DYNAMIC GAMES AND APPLICATIONS, 2017, 7 (01) : 1 - 20
[42] The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games
Xie, Kaiyun
Xiong, Junlin
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2349 - 2354
[43] Purified Policy Space Response Oracles for Symmetric Zero-Sum Games
Shao, Zhengdao
Zhuang, Liansheng
Huang, Yihong
Li, Houqiang
Wang, Shafei
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[44] Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
Mao, Weichao
Basar, Tamer
DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 165 - 186
[45] Zero-sum infinite-horizon discounted piecewise deterministic Markov games
Huang, Yonghui
Lian, Zhaotong
Guo, Xianping
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2023, 97 (02) : 179 - 205
[46] Zero-sum infinite-horizon discounted piecewise deterministic Markov games
Yonghui Huang
Zhaotong Lian
Xianping Guo
Mathematical Methods of Operations Research, 2023, 97 : 179 - 205
[47] Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
Weichao Mao
Tamer Başar
Dynamic Games and Applications, 2023, 13 : 165 - 186
[48] Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes
O. L. V. Costa
F. Dufour
Applied Mathematics & Optimization, 2018, 78 : 587 - 611
[49] Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Park, Chanwoo
Zhang, Kaiqing
Ozdaglar, Asuman
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] Policy Similarity Measure for Two-Player Zero-Sum Games
Tang, Hongsong
Xiang, Liuyu
He, Zhaofeng
APPLIED SCIENCES-BASEL, 2025, 15 (05):

← 1 2 3 4 5 →