Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

被引:0
|
作者
Cen, Shicong [1 ]
Wei, Yuting [2 ]
Chi, Yuejie [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Univ Penn, Dept Stat & Data Sci, Wharton Sch, Philadelphia, PA USA
基金
美国安德鲁·梅隆基金会;
关键词
zero -sum Markov game; matrix game; entropy regularization; global conver; gence; multiplicative updates; no -regret learning; extragradient methods; VARIATIONAL INEQUALITY; CONVERGENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of computing the equilibrium of competitive games in the form of two-player zero-sum games, which is often modeled as a constrained saddlepoint optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE)-which are solutions to zero-sum two-player matrix games with entropy regularization-at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponent's actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving (entropy-regularized) zero-sum Markov games at similar rates. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.
引用
收藏
页码:1 / 48
页数:48
相关论文
共 42 条
  • [1] Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
    Cen, Shicong
    Wei, Yuting
    Chi, Yuejie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
    Cen, Shicong
    Cheng, Chen
    Chen, Yuxin
    Wei, Yuting
    Chi, Yuejie
    OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
  • [3] Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization
    Sun, Youbang
    Liu, Tao
    Kumar, P. R.
    Shahrampour, Shahin
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 1217 - 1222
  • [4] Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
    Cen, Shicong
    Chen, Fan
    Chi, Yuejie
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2833 - 2838
  • [5] Regularization extragradient methods for equilibrium programming in Hilbert spaces
    Hieu, Dang Van
    Muu, Le Dung
    Kim Quy, Pham
    Duong, Hoang Ngoc
    OPTIMIZATION, 2022, 71 (09) : 2643 - 2673
  • [6] Entropy Regularization for Mean Field Games with Learning
    Guo, Xin
    Xu, Renyuan
    Zariphopoulou, Thaleia
    MATHEMATICS OF OPERATIONS RESEARCH, 2022, : 3239 - 3260
  • [7] Proximal Policy Optimization with Entropy Regularization
    Shen, Yuqing
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383
  • [8] Exploratory LQG mean field games with entropy regularization
    Firoozi, Dena
    Jaimungal, Sebastian
    AUTOMATICA, 2022, 139
  • [9] Regularization of the Policy Updates for Stabilizing Mean Field Games
    Algumaei, Talal
    Solozabal, Ruben
    Alami, Reda
    Hacid, Hakim
    Debbah, Merouane
    Takac, Martin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 361 - 372
  • [10] Entropy regularization methods for parameter space exploration
    Han, Shuai
    Zhou, Wenbo
    Lu, Shuai
    Zhu, Sheng
    Gong, Xiaoyu
    INFORMATION SCIENCES, 2023, 622 (476-489) : 476 - 489