Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

被引:0
|
作者
Cen, Shicong [1 ]
Wei, Yuting [2 ]
Chi, Yuejie [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Univ Penn, Dept Stat & Data Sci, Wharton Sch, Philadelphia, PA USA
基金
美国安德鲁·梅隆基金会;
关键词
zero -sum Markov game; matrix game; entropy regularization; global conver; gence; multiplicative updates; no -regret learning; extragradient methods; VARIATIONAL INEQUALITY; CONVERGENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of computing the equilibrium of competitive games in the form of two-player zero-sum games, which is often modeled as a constrained saddlepoint optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE)-which are solutions to zero-sum two-player matrix games with entropy regularization-at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponent's actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving (entropy-regularized) zero-sum Markov games at similar rates. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.
引用
收藏
页码:1 / 48
页数:48
相关论文
共 42 条
  • [21] Fast CG-based methods for Tikhonov-Phillips regularization
    Frommer, Andreas
    Maass, Peter
    SIAM Journal on Scientific Computing, 20 (05): : 1831 - 1850
  • [22] Fast multiscale regularization methods for high-order numerical differentiation
    Wu, Bin
    Zhang, Qinghui
    IMA JOURNAL OF NUMERICAL ANALYSIS, 2016, 36 (03) : 1432 - 1451
  • [23] Trust region policy optimization via entropy regularization for Kullback-Leibler divergence constraint
    Xu, Haotian
    Xuan, Junyu
    Zhang, Guangquan
    Lu, Jie
    NEUROCOMPUTING, 2024, 589
  • [24] Fast Efficient Hyperparameter Tuning for Policy Gradient Methods
    Paul, Supratik
    Kurin, Vitaly
    Whiteson, Shimon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [25] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
    Sun, Youbang
    Liu, Tao
    Zhou, Ruida
    Kumar, P. R.
    Shahrampour, Shahin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] On the convergence of policy gradient methods to Nash equilibria in general stochastic games
    Giannou, Angeliki
    Lotidis, Kyriakos
    Mertikopoulos, Panayotis
    Vlatakis-Gkaragkounis, Emmanouil V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] Scale-up methods for fast competitive chemical reactions in pipeline mixers
    Taylor, RAT
    Penney, WR
    Vo, HX
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2005, 44 (16) : 6095 - 6102
  • [28] Provable Policy Gradient Methods for Average-Reward Markov Potential Games
    Cheng, Min
    Zhou, Ruida
    Kumar, P. R.
    Tian, Chao
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [29] Fast optimization methods for L1 regularization: A comparative study and two new approaches
    Schmidt, Mark
    Fung, Glenn
    Rosales, Romer
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 286 - +
  • [30] FAST DIFFUSION EQUATIONS: MATCHING LARGE TIME ASYMPTOTICS BY RELATIVE ENTROPY METHODS
    Dolbeault, Jean
    Toscani, Giuseppe
    KINETIC AND RELATED MODELS, 2011, 4 (03) : 701 - 716