Tight last-iterate convergence rates for no-regret learning in multi-player games

被引:0
|
作者
Golowich, Noah [1 ]
Pattathil, Sarath [2 ]
Daskalakis, Constantinos [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] MIT, EECS, Cambridge, MA 02139 USA
关键词
DYNAMICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of O(1/root T) with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the O(1/root T) rate is tight for all p-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.
引用
下载
收藏
页数:13
相关论文
共 37 条
  • [21] Iterative ADP learning algorithms for discrete-time multi-player games
    He Jiang
    Huaguang Zhang
    Artificial Intelligence Review, 2018, 50 : 75 - 91
  • [22] Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
    Anagnostides, Ioannis
    Panageas, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 9451 - 9459
  • [23] Inverse Reinforcement Learning for Multi-player Apprentice Games in Continuous-Time Nonlinear Systems
    Lian, Bosen
    Xue, Wenqian
    Lewis, Frank L.
    Chai, Tianyou
    Davoudi, Ali
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 803 - 808
  • [24] Min-Max Q-learning for multi-player pursuit-evasion games
    Selvakumar, Jhanani
    Bakolas, Efstathios
    NEUROCOMPUTING, 2022, 475 : 1 - 14
  • [25] Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    IEEE ACCESS, 2019, 7 : 134647 - 134659
  • [26] EVENT-TRIGGERED ADAPTIVE CONTROL FOR NONLINEAR MULTI-PLAYER GAMES USING NEURAL CRITIC LEARNING
    Li, Ping
    Zhang, Huiyan
    Ao, Wengang
    Liu, Pengda
    International Journal of Innovative Computing, Information and Control, 2024, 20 (05): : 1257 - 1275
  • [27] Neural-network-based synchronous iteration learning method for multi-player zero-sum games
    Song, Ruizhuo
    Wei, Qinglai
    Song, Biao
    NEUROCOMPUTING, 2017, 242 : 73 - 82
  • [28] Analysis of Peer Interaction Features in Multi-Player Learning Games: Semi-Automatic Approach and Proof of Concept
    Guinebert, Mathieu
    Yessad, Amel
    Muratet, Mathieu
    Luengo, Vanda
    PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON GAMES BASED LEARNING (ECGBL 2018), 2018, : 162 - 170
  • [29] Multi-player non-zero-sum games: Online adaptive learning solution of coupled HamiltonJacobi equations
    Automation and Robotics Research Institute, University of Texas at Arlington, 7300 Jack Newell Blvd. S., Ft. Worth, TX 76118, United States
    Automatica, 8 (1556-1569):
  • [30] Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
    Vamvoudakis, Kyriakos G.
    Lewis, Frank L.
    AUTOMATICA, 2011, 47 (08) : 1556 - 1569