Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引:0
|
作者
Sun, Youbang [1 ]
Liu, Tao [2 ]
Kumar, P. R. [2 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
来源
关键词
Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;
D O I
10.1109/LCSYS.2024.3410149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.
引用
收藏
页码:1217 / 1222
页数:6
相关论文
共 50 条
  • [31] Natural gradient ascent in evolutionary games
    Jacimovic, Vladimir
    BIOSYSTEMS, 2024, 236
  • [32] On the Convergence Rates of Policy Gradient Methods
    Xiao, Lin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [33] Regularization of the Policy Updates for Stabilizing Mean Field Games
    Algumaei, Talal
    Solozabal, Ruben
    Alami, Reda
    Hacid, Hakim
    Debbah, Merouane
    Takac, Martin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 361 - 372
  • [34] Deterministic Policy Gradient: Convergence Analysis
    Xiong, Huaqing
    Xu, Tengyu
    Zhao, Lin
    Liang, Yingbin
    Zhang, Wei
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2159 - 2169
  • [35] Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon
    Zhang, Xinpei
    Jia, Guangyan
    JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2025, 547 (01)
  • [36] Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient Method and Global Convergence
    Jansch-Porto, Joao Paulo
    Hu, Bin
    Dullerud, Geir E.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (04) : 2475 - 2482
  • [37] Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence
    Pattathil, Sarath
    Zhang, Kaiqing
    Ozdaglar, Asuman
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [38] Global Convergence of Natural Policy Gradient with Hessian-Aided Momentum Variance Reduction
    Feng, Jie
    Wei, Ke
    Chen, Jinchi
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 101 (02)
  • [39] LINEAR CONVERGENCE OF CONJUGATE GRADIENT METHOD
    CROWDER, H
    WOLFE, P
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1972, 16 (04) : 431 - &
  • [40] Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems
    Hu, Yaohua
    Li, Chong
    Meng, Kaiwen
    Yang, Xiaoqi
    JOURNAL OF GLOBAL OPTIMIZATION, 2021, 79 (04) : 853 - 883