Unifying convergence and no-regret in multiagent learning

被引:0
|
作者
Banerjee, Bikramjit [1 ]
Peng, Jing [1 ]
机构
[1] Tulane Univ, Dept Elect Engn & Comp Sci, New Orleans, LA 70118 USA
关键词
D O I
10.1007/11691839_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new multiagent learning algorithm, RV sigma(t), that builds on an earlier version, ReDVaLeR. ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive learning rate of RV sigma(t) that is explicitly dependent on time can overcome both of these assumptions. Consequently, RV,(t) theoretically achieves (a') convergence to near-best response against eventually stationary opponents, (W) no-regret payoff against arbitrary opponents and (c') convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need to distinguish among non-stationary opponent types. This is also the first successful attempt (to our knowledge) at convergence of a no-regret algorithm in the Shapley game.
引用
收藏
页码:100 / 114
页数:15
相关论文
共 50 条
  • [1] On the convergence of no-regret learning in selfish routing
    Krichene, Walid
    Drighes, Benjamin
    Bayen, Alexandre
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 163 - 171
  • [2] Distributed No-Regret Learning in Multiagent Systems: Challenges and Recent Developments
    Xu, Xiao
    Zhao, Qing
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 84 - 91
  • [3] Constrained no-regret learning
    Du, Ye
    Lehrer, Ehud
    [J]. JOURNAL OF MATHEMATICAL ECONOMICS, 2020, 88 : 16 - 24
  • [4] On the Convergence of No-Regret Learning Dynamics in Time-Varying Games
    Anagnostides, Ioannis
    Panageas, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] No-regret Reinforcement Learning
    Gopalan, Aditya
    [J]. 2019 FIFTH INDIAN CONTROL CONFERENCE (ICC), 2019, : 16 - 16
  • [6] No-Regret Learning in Bayesian Games
    Hartline, Jason
    Syrgkanis, Vasilis
    Tardos, Eva
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [7] Online Optimisation for Online Learning and Control - From No-Regret to Generalised Error Convergence
    Calliess, J.
    [J]. 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 2480 - 2485
  • [8] Limits and limitations of no-regret learning in games
    Monnot, Barnabe
    Piliouras, Georgios
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2017, 32
  • [9] No-Regret Learning Supports Voters' Competence
    Spelda, Petr
    Stritecky, Vit
    Symons, John
    [J]. SOCIAL EPISTEMOLOGY, 2024, 38 (05) : 543 - 559
  • [10] No-Regret Learning in Dynamic Stackelberg Games
    Lauffer, Niklas
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Savas, Yagiz
    Topcu, Ufuk
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (03) : 1418 - 1431