Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引:0
|
作者
Sun, Youbang [1 ]
Liu, Tao [2 ]
Kumar, P. R. [2 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
来源
关键词
Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;
D O I
10.1109/LCSYS.2024.3410149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
引用
收藏
页码:1217 / 1222
页数:6
相关论文
共 50 条
  • [21] Convergence of Policy Gradient Methods for Nash Equilibria in General-sum Stochastic Games
    Chen, Yan
    Li, Tao
    IFAC PAPERSONLINE, 2023, 56 (02): : 3435 - 3440
  • [22] Exploratory LQG mean field games with entropy regularization
    Firoozi, Dena
    Jaimungal, Sebastian
    AUTOMATICA, 2022, 139
  • [23] CONVERGENCE-RATES FOR MAXIMUM-ENTROPY REGULARIZATION
    ENGL, HW
    LANDL, G
    SIAM JOURNAL ON NUMERICAL ANALYSIS, 1993, 30 (05) : 1509 - 1536
  • [24] Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games
    Zhou, Shiyang
    Ren, Weiya
    Ren, Xiaoguang
    Wang, Yanzhen
    Yi, Xiaodong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 625 - 637
  • [25] Networked Aggregative Games with Linear Convergence
    Zhu, Rongping
    Zhang, Jiaqi
    You, Keyou
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3381 - 3386
  • [26] On regularization of generalized maximum entropy for linear models
    Maneejuk, Paravee
    SOFT COMPUTING, 2021, 25 (12) : 7867 - 7875
  • [27] A natural policy gradient
    Kakade, S
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1531 - 1538
  • [28] On regularization of generalized maximum entropy for linear models
    Paravee Maneejuk
    Soft Computing, 2021, 25 : 7867 - 7875
  • [29] Regularization of linear approximate schemes by the gradient descent
    Izmailov, AF
    Karmanov, VG
    Tretyakov, AA
    SIAM JOURNAL ON NUMERICAL ANALYSIS, 2001, 39 (01) : 250 - 263
  • [30] On the Convergence Rates of Policy Gradient Methods
    Xiao, Lin
    Journal of Machine Learning Research, 2022, 23