Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

被引:0
|
作者
Cen, Shicong [1 ]
Wei, Yuting [2 ]
Chi, Yuejie [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Univ Penn, Dept Stat & Data Sci, Wharton Sch, Philadelphia, PA USA
基金
美国安德鲁·梅隆基金会;
关键词
zero -sum Markov game; matrix game; entropy regularization; global conver; gence; multiplicative updates; no -regret learning; extragradient methods; VARIATIONAL INEQUALITY; CONVERGENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of computing the equilibrium of competitive games in the form of two-player zero-sum games, which is often modeled as a constrained saddlepoint optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE)-which are solutions to zero-sum two-player matrix games with entropy regularization-at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponent's actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving (entropy-regularized) zero-sum Markov games at similar rates. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.
引用
收藏
页码:1 / 48
页数:48
相关论文
共 42 条
  • [31] Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games
    Chen, Yan
    Li, Tao
    IFAC PAPERSONLINE, 2023, 56 (02): : 3435 - 3440
  • [32] Spectral reconstruction methods in fast NMR: Reduced dimensionality, random sampling and maximum entropy
    Mobli, Mehdi
    Stern, Alan S.
    Hoch, Jeffrey C.
    JOURNAL OF MAGNETIC RESONANCE, 2006, 182 (01) : 96 - 105
  • [33] Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation
    Guan, Yue
    Zhang, Qifan
    Tsiotras, Panagiotis
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2462 - 2468
  • [34] Local Analysis of Entropy-Regularized Stochastic Soft-Max Policy Gradient Methods
    Ding, Yuhao
    Zhang, Junzi
    Lavaei, Javad
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [35] GCV and ML methods of determining parameters in image restoration by regularization. Fast computation in the spatial domain and experimental comparison
    Fortier, Natalie
    Demoment, Guy
    Goussard, Yves
    Journal of Visual Communication and Image Representation, 1993, 4 (02)
  • [36] Synthesising results of meta-analyses to inform policy: a comparison of fast-track methods
    David Makowski
    Rui Catarino
    Mathilde Chen
    Simona Bosco
    Ana Montero-Castaño
    Marta Pérez-Soba
    Andrea Schievano
    Giovanni Tamburini
    Environmental Evidence, 12
  • [37] Synthesising results of meta-analyses to inform policy: a comparison of fast-track methods
    Makowski, David
    Catarino, Rui
    Chen, Mathilde
    Bosco, Simona
    Montero-Castano, Ana
    Perez-Soba, Marta
    Schievano, Andrea
    Tamburini, Giovanni
    ENVIRONMENTAL EVIDENCE, 2023, 12 (01)
  • [38] WEIGHTED FAST DIFFUSION EQUATIONS (PART II): SHARP ASYMPTOTIC RATES OF CONVERGENCE IN RELATIVE ERROR BY ENTROPY METHODS
    Bonforte, Matteo
    Dolbeault, Jean
    Muratori, Matteo
    Nazaret, Bruno
    KINETIC AND RELATED MODELS, 2017, 10 (01) : 61 - 91
  • [39] Entropy-Driven Stochastic Policy for Fast Federated Learning in Beyond 5G Edge-RAN
    Aamer, Brahim
    Chergui, Hatim
    Benjillali, Mustapha
    Verikoukis, Christos
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [40] Performance Bounds for Policy-Based Reinforcement Learning Methods in Zero-Sum Markov Games With Linear Function Approximation
    Winnicki, Anna
    Srikant, R.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 7144 - 7149