On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

被引:0
|
作者
Zhang, Runyu [1 ]
Mei, Jincheng [2 ]
Dai, Bo [2 ]
Schuurmans, Dale [2 ,3 ]
Li, Na [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] Google Res, Brain Team, Mountain View, CA USA
[3] Univ Alberta, Edmonton, AB, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without log-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory dependent constant that can be arbitrarily large, whereas the log-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.
引用
收藏
页数:13
相关论文
共 27 条
  • [21] Decentralized Fictitious Play in Near-Potential Games With Time-Varying Communication Networks
    Aydin, Sarper
    Arefizadeh, Sina
    Eksin, Ceyhun
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 1226 - 1231
  • [22] On the Convergence Rates of A Nash Equilibrium Seeking Algorithm in Potential Games with Information Delays
    Huang, Yuanhanqing
    Hu, Jianghai
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1080 - 1085
  • [23] A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates
    Li, Zhi
    Shi, Wei
    Yan, Ming
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (17) : 4494 - 4506
  • [24] Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
    Anagnostides, Ioannis
    Panageas, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 9451 - 9459
  • [25] Policy gradient algorithm and its convergence analysis for two-player zero-sum Markov games
    Wang Z.
    Li Y.
    Feng Y.
    Feng Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (03): : 480 - 491
  • [26] Convergence Rates of Decentralized Gradient Dynamics over Cluster Networks: Multiple-Time-Scale Lyapunov Approach
    Dutta, Amit
    Masrourisaadat, Nila
    Doan, Thinh T.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 6497 - 6502
  • [27] Excess Payoff Evolutionary Dynamics With Strategy-Dependent Revision Rates: Convergence to Nash Equilibria for Potential Games
    Kara, Semih
    Martins, Nuno C.
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 1009 - 1014