On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

被引：0

作者：

Zhang, Runyu ^{[1
]}

Mei, Jincheng ^{[2
]}

Dai, Bo ^{[2
]}

Schuurmans, Dale ^{[2
,3
]}

Li, Na ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Google Res, Brain Team, Mountain View, CA USA

[3] Univ Alberta, Edmonton, AB, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without log-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory dependent constant that can be arbitrarily large, whereas the log-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.

引用

页数：13

共 27 条

[1] Policy Gradient Play with Networked Agents in Markov Potential Games
Aydin, Sarper
Eksin, Ceyhun
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[2] On the convergence of distributed projected gradient play with heterogeneous learning rates in monotone games
Tan, Shaolin
Tao, Ye
Ran, Maopeng
Liu, Hao
SYSTEMS & CONTROL LETTERS, 2023, 182
[3] Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games
Sun, Youbang
Liu, Tao
Zhou, Ruida
Kumar, P. R.
Shahrampour, Shahin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games
Zhou, Zhaoyi
Chen, Zaiwei
Lin, Yiheng
Wierman, Adam
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2563 - 2573
[5] Policy Gradient Play over Time-Varying Networks in Markov Potential Games
Aydin, Sarper
Eksin, Ceyhun
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1997 - 2002
[6] Decentralized Proximal Gradient Algorithms With Linear Convergence Rates
Alghunaim, Sulaiman A.
Ryu, Ernest K.
Yuan, Kun
Sayed, Ali H.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (06) : 2787 - 2794
[7] Convergence Bounds of Decentralized Fictitious Play Around a Single Nash Equilibrium in Near-Potential Games
Aydin, Sarper
Arefizadeh, Sina
Eksin, Ceyhun
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2519 - 2524
[8] Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence
Ding, Dongsheng
Wei, Chen-Yu
Zhang, Kaiqing
Jovanovic, Mihailo R.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[9] On the Exponential Rate of Convergence of Fictitious Play in Potential Games
Swenson, Brian
Kar, Soummya
2017 55TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2017, : 275 - 279
[10] Gradient Play in Stochastic Games: Stationary Points, Convergence, and Sample Complexity
Zhang, Runyu
Ren, Zhaolin
Li, Na
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (10) : 6499 - 6514

← 1 2 3 →