Linear Convergence of Independent Natural Policy Gradient in Games With Entropy Regularization

被引：0

作者：

Sun, Youbang ^{[1
]}

Liu, Tao ^{[2
]}

Kumar, P. R. ^{[2
]}

Shahrampour, Shahin ^{[1
]}

机构：

[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA

[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA

来源：

IEEE CONTROL SYSTEMS LETTERS | 2024年 / 8卷

关键词：

Games; Entropy; Convergence; Nash equilibrium; Reinforcement learning; Gradient methods; Approximation algorithms; Game theory; multi-agent reinforcement learning; natural policy gradient; quantal response equilibrium;

D O I：

10.1109/LCSYS.2024.3410149

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this letter, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all agents in the multi-agent system, leading to a game between agents. All agents make decisions under a policy with bounded rationality, which is enforced by the introduction of entropy regularization. In practice, a smaller regularization implies that agents are more rational and behave closer to Nash policies. On the other hand, with larger regularization agents tend to act randomly, which ensures more exploration. We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE). Although regularization assumptions prevent the QRE from approximating a Nash equilibrium (NE), our findings apply to a wide range of games, including cooperative, potential, and two-player matrix games. We also provide extensive empirical results on multiple games (including Markov games) as a verification of our theoretical analysis.

引用

页码：1217 / 1222

页数：6

共 50 条

[1] Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Cen, Shicong
Chen, Fan
Chi, Yuejie
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2833 - 2838
[2] Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Cen, Shicong
Cheng, Chen
Chen, Yuxin
Wei, Yuting
Chi, Yuejie
OPERATIONS RESEARCH, 2021, 70 (04) : 2563 - 2578
[3] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Sun, Youbang
Liu, Tao
Zhou, Ruida
Kumar, P. R.
Shahrampour, Shahin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] CONVERGENCE OF ENTROPY-REGULARIZED NATURAL POLICY GRADIENT WITH LINEAR FUNCTION APPROXIMATION
Cayci, Semih
He, Niao
Srikant, R.
SIAM JOURNAL ON OPTIMIZATION, 2024, 34 (03) : 2729 - 2755
[5] On the Linear Convergence of Natural Policy Gradient Algorithm
Khodadadian, Sajad
Jhunjhunwala, Prakirt Raj
Varma, Sushil Mahavir
Maguluri, Siva Theja
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 3794 - 3799
[6] On linear and super-linear convergence of Natural Policy Gradient algorithm
Khodadadian, Sajad
Jhunjhunwala, Prakirt Raj
Varma, Sushil Mahavir
Maguluri, Siva Theja
SYSTEMS & CONTROL LETTERS, 2022, 164
[7] Independent Natural Policy Gradient Always Converges in Markov Potential Games
Fox, Roy
McAleer, Stephen
Overman, William
Panageas, Ioannis
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[8] Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Cen, Shicong
Wei, Yuting
Chi, Yuejie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[9] Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Cen, Shicong
Wei, Yuting
Chi, Yuejie
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 48
[10] Geometry and convergence of natural policy gradient methods
Müller J.
Montúfar G.
Information Geometry, 2024, 7 (Suppl 1) : 485 - 523

← 1 2 3 4 5 →