Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引:0
|
作者
Sun, Youbang [1 ]
Liu, Tao [2 ]
Kumar, P. R. [2 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
来源
关键词
Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;
D O I
10.1109/LCSYS.2024.3410149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
引用
收藏
页码:1217 / 1222
页数:6
相关论文
共 50 条
  • [41] Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems
    Yaohua Hu
    Chong Li
    Kaiwen Meng
    Xiaoqi Yang
    Journal of Global Optimization, 2021, 79 : 853 - 883
  • [42] On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs
    Murthy, Yashaswini
    Srikant, R.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1979 - 1984
  • [43] Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime
    Kerimkulov, Bekzhan
    Leahy, James-Michael
    Siska, David
    Szpruch, Lukasz
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [44] Policy gradient method for team Markov games
    Könönen, V
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 733 - 739
  • [45] LINEAR CONVERGENCE OF A POLICY GRADIENT METHOD FOR SOME FINITE HORIZON CONTINUOUS TIME CONTROL PROBLEMS
    Reisinger, Christoph
    Stockinger, Wolfgang
    Zhang, Yufei
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2023, 61 (06) : 3526 - 3558
  • [46] Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods
    Fathi, Vida
    Arabneydi, Jalal
    Aghdam, Amir G.
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 4927 - 4932
  • [47] Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
    Anagnostides, Ioannis
    Panageas, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 9451 - 9459
  • [48] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
    Wang Z.
    Li Y.
    Feng Y.
    Feng Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
  • [49] POLICY OPTIMIZATION FOR H2 LINEAR CONTROL WITH H∞ ROBUSTNESS GUARANTEE: IMPLICIT REGULARIZATION AND GLOBAL CONVERGENCE
    Zhang, Kaiqing
    Hu, Bin
    Basar, Tamer
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2021, 59 (06) : 4081 - 4109
  • [50] Policy Optimization for H2 Linear Control with H∞ Robustness Guarantee: Implicit Regularization and Global Convergence
    Zhang, Kaiqing
    Hu, Bin
    Basar, Tamer
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 179 - 190