On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

被引：0

作者：

Zhang, Runyu ^{[1
]}

Mei, Jincheng ^{[2
]}

Dai, Bo ^{[2
]}

Schuurmans, Dale ^{[2
,3
]}

Li, Na ^{[1
]}

机构：

[1] Harvard Univ, Cambridge, MA 02138 USA

[2] Google Res, Brain Team, Mountain View, CA USA

[3] Univ Alberta, Edmonton, AB, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without log-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory dependent constant that can be arbitrarily large, whereas the log-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.

引用

页数：13

共 27 条

[21] Decentralized Fictitious Play in Near-Potential Games With Time-Varying Communication Networks
Aydin, Sarper
Arefizadeh, Sina
Eksin, Ceyhun
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 1226 - 1231
[22] On the Convergence Rates of A Nash Equilibrium Seeking Algorithm in Potential Games with Information Delays
Huang, Yuanhanqing
Hu, Jianghai
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1080 - 1085
[23] A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates
Li, Zhi
Shi, Wei
Yan, Ming
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (17) : 4494 - 4506
[24] Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
Anagnostides, Ioannis
Panageas, Ioannis
Farina, Gabriele
Sandholm, Tuomas
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 9451 - 9459
[25] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
Wang Z.
Li Y.
Feng Y.
Feng Y.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
[26] Convergence Rates of Decentralized Gradient Dynamics over Cluster Networks: Multiple-Time-Scale Lyapunov Approach
Dutta, Amit
Masrourisaadat, Nila
Doan, Thinh T.
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 6497 - 6502
[27] Excess Payoff Evolutionary Dynamics With Strategy-Dependent Revision Rates: Convergence to Nash Equilibria for Potential Games
Kara, Semih
Martins, Nuno C.
IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 1009 - 1014

← 1 2 3 →