Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

被引:0
|
作者
Qiu, Shuang [1 ]
Wei, Xiaohan [2 ]
Ye, Jieping [1 ]
Wang, Zhaoran [3 ]
Yang, Zhuoran [4 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Facebook Inc, Menlo Pk, CA 94025 USA
[3] Northwestern Univ, Evanston, IL 60208 USA
[4] Princeton Univ, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
CONSTRAINED STOCHASTIC GAMES; SINGLE-CONTROLLER; BOUNDS; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for two-player zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight (O) over tilde(root T) regret bounds after T steps in a two-agent competitive game scenario. The regret of each player is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is (O) over tilde(root T).
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Extremal Shift Rule for Continuous-Time Zero-Sum Markov Games
    Averboukh, Yurii
    DYNAMIC GAMES AND APPLICATIONS, 2017, 7 (01) : 1 - 20
  • [42] The Design of ϵ-Optimal Strategy for Two-Person Zero-Sum Markov Games
    Xie, Kaiyun
    Xiong, Junlin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2349 - 2354
  • [43] Purified Policy Space Response Oracles for Symmetric Zero-Sum Games
    Shao, Zhengdao
    Zhuang, Liansheng
    Huang, Yihong
    Li, Houqiang
    Wang, Shafei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [44] Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
    Mao, Weichao
    Basar, Tamer
    DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 165 - 186
  • [45] Zero-sum infinite-horizon discounted piecewise deterministic Markov games
    Huang, Yonghui
    Lian, Zhaotong
    Guo, Xianping
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2023, 97 (02) : 179 - 205
  • [46] Zero-sum infinite-horizon discounted piecewise deterministic Markov games
    Yonghui Huang
    Zhaotong Lian
    Xianping Guo
    Mathematical Methods of Operations Research, 2023, 97 : 179 - 205
  • [47] Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
    Weichao Mao
    Tamer Başar
    Dynamic Games and Applications, 2023, 13 : 165 - 186
  • [48] Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes
    O. L. V. Costa
    F. Dufour
    Applied Mathematics & Optimization, 2018, 78 : 587 - 611
  • [49] Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
    Park, Chanwoo
    Zhang, Kaiqing
    Ozdaglar, Asuman
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Policy Similarity Measure for Two-Player Zero-Sum Games
    Tang, Hongsong
    Xiang, Liuyu
    He, Zhaofeng
    APPLIED SCIENCES-BASEL, 2025, 15 (05):